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N the long history of speculation on the causes of explosive out- 
breaks of disease, William Farr and the 19th century mark a period 
of transition from philosophical to mathematical models of the behavior 
and course of epidemics. Farr’s period of activity falls, roughly, be- 
tween Schleiden’s and Schwann’s announcement of the cell theory and 
Pasteur, Koch, and the science of bacteriology. In the early years of 
Farr’s century, Hippocratic epidemiology with the philosophically elegant 
modifications of Galen still pervaded medical thinking and the perplexing 
tangle of spread of infectious disease through the media of the common 
carrier, direct contact, or arthropod host had yet to be unravelled. 

In this setting Farr’s recognition that the orderliness of epidemic 
patterns might be described in mathematical form, could be implemented 
only by an empirical mathematical approach. Development of a mathe- 
matical model providing a mechanistic explanation of the observations 
required further understanding of specific biological causes. 

In introducing his empirical description of the English smallpox 


nd 
of 
q 
ve, 
of 
ix 
ae 

5, 

r? 

nd 
al 
6, 
ts 
Ss. 
of 

in be 
r- | 
| 
b- 
i | 
| 
| q 


146 ROBERT E. SERFLING 


outbreaks of the late eighteen-thirties Farr (1840) commented “The 
diseases of the epidemic class follow laws of their own; they remain 
nearly stationary during months, years, and as we learn from medical 
history, centuries; then suddently rise like a mist from the earth and 
shed desolation on nations, to disappear as rapidly or insensibly as they 
came. ... Epidemics have furnished much matter for discussion and 
still offer large scope for inquiry. They have been attributed to terres- 
trial emanations, to the influence of the stars, to mysterious changes in 
the atmosphere, to heat, to animalcules, to deteriorated food (and) to 
contagion .. .”. 

In the century since Farr set down these lines, the assembly of 
innumerable observations and scrutiny of their relationships have cleared 
away the speculative hypotheses of his day, but his comment “ Epidemics 
have furnished much matter for discussion, and still offer large scope 
for inquiry,” remains as challenging to 20th century investigators as to 
earlier scientific generations. 


EMPIRICAL DESCRIPTIONS OF EPIDEMICS 


In the Second Report of the Registrar-General of England and Wales 
(1840) Farr commented on the regularity which appeared in the rise 
and decline of epidemics, and after noting some of the causes which had 
been proposed to account for the phenomena, turned to a discussion 
of the English smallpox epidemic of 1837-1839. Smallpox deaths by 
quarters, during that period were 2,513 in the third suarter of 1837, 
rose to a peak of 4,489 in the second quarter of 1838, and declined to 
1,730 in the final quarter of 1839. Farr smoothed these data by means 
of a moving average of successive quarterly deaths and calculated the 
second ratios of deaths in successive quarters. Since the second ratios 
(ratio of ratios) were of approximately the same magnitude, he assumed 
that they could be considered constant and on that assumption calculated 
an expected series of deaths. The results of his calculations (Figure 1) 
were in reasonable agreement with the observed series. His method was 
equivalent to fitting a normal frequency curve to the smoothed smallpox 
death frequencies, although he did not mention the relationship in his 
report. 

Farr promised to develop these studies but according to Brownlee 
(1915b) no further work was published until Farr wrote the London 
Daily News a “ vivid and characteristic ” letter (published February 17, 
1866), concerning the current outbreak of cattle plague in England. 
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The outbreak had begun toward the end of 1865 and the number 
of cases increased steadily each month. Early in 1866, a Mr. Lowe in 
a speech in the House of Commons had stated, “If we do not get the 
disease under control by the middle of April, prepare yourself for a 
calamity beyond all calculations. You have seen the thing in its infancy. 
Wait and you will see the averages, which have been thousands, grow 
to tens of thousands, for there is no reason why the same terrible law 
of increase which has prevailed hitherto should not prevail henceforth.” 
Farr commented, “ No one can express a proposition more clearly than 

Recorded Smoalipox Deoths by Quarters 
Englond, 1837-1639 
Dota used by Farr in nis calculations 
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Fig. 1. Farr’s Emprricat DESCRIPTION OF THE DECLINE OF THE SMALLPOX 
EPIDEMIC IN ENGLAND, 1837-1839 


Mr. Lowe, but the clearness of a proposition is no evidence of its worth,” 
gave his own views on the behavior of epidemics, and in accordance with 
them, estimated the probable course of the outbreak. 

His mathematical approach this time employed the assumption that 
the third ratios of successive pairs of frequencies of reported cases during 
four-week periods were constant, but he did not explain why he went 
to third ratios instead of using second ratios as he had in the 1840 
report. The strength of his convictions may be judged by the fact that 
he employed data (Table 1) from only four periods, allowing him a 
single third ratio on which to base his prediction of the end of the 
epidemic. 
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Farr’s forecast placed the peak of the epizootic earlier than it actually 
occurred, with a more rapid decline, but nevertheless adequately sup- 
ported his views on the basic regularity underlying the progress of an 
epidemic. 

Four decades elapsed between Farr’s work of 1866 and the next con- 
sequential publication. Brownlee (1906) notes that Evans (1875) tried 
to use Farr’s methods on the English smallpox outbreak of 1871-72 but 
with little success (Ross, 1916), and it was not until early in the 
twentieth century that we find an extensive resumption of the work 
initiated by Farr. 

TABLE 1 


Farr’s approximation to the English rinderpest outbreak of 1865-66 


PERIODS OF FOUR REPORTED FARR’S CALCULATED FINAL REPORTED 

WEEKS ENDING CASES CASES CASES 
1865 Nov. 4 9 597 9 597 9 597 
Dec. 2 18 817 18 817 18 817 
Dec. 30 33 835 33 835 33 835 
1866 Jan. 27 47 191 47 191 47 287 
Feb. 24 43 182 57 004 
Mar. 24 21 927 27 958 
Apr. 21 5 226 15 856 
May 19 494 14734 

June 16 16 5 000 ( About) 


For nearly a quarter of a century Dr. John Brownlee’s imagination 
was fired with the hope of deducing the laws governing epidemics from 
study of the properties of equations which would fit recorded outbreaks 
of disease. In 1906 he published a paper on the theory of an epidemic 
in which he stated the conclusions drawn from an extensive study of the 
fit of Pearson’s system of frequency curves to the curves of epidemics. 
The epidemics examined were collected from many sources, and among 
others, included plague in London in 1563 and 1665, smallpox in Boston 
in 1721, in Glasgow in 1794, in Gloucester in 1896, and measles, diarrhea, 
scarlet fever, influenza and yellow fever from a number of places during 
the years 1857-1902. 

For these epidemics he computed and published variances and third 
and fourth moments. At a later date (Brownlee, 1918) ue published 
similar statistics for outbreaks of plague and mentioned that he had also 
calculated them for outbreaks in London, Alexandria, and more than 
100 others in Hongkong, Sidney, and Indian cities. 
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In these studies he observed that the curve of his epidemics was 
either symmetrical or had, on the average, small positive skewness,’ and 
concluded that a Pearson Type IV frequency curve would fit a majority 
of his observations. He apparently was familiar with the current hypo- 
thesis (Hamer, 1906) that the progress of an epidemic is regulated by the 
number of susceptibles and the rate of contact between infectious cases 
and susceptibles, and knew that direct calculation of successive genera- 
tions of cases, on this hypothesis, led to an epidemic with negative skew- 
nesss. As he rarely found negative skewness in his observations on 
epidemics, Brownlee set up a hypothesis that the waxing and waning 
of an epidemic was primarily the result of a biological change in the 
“ infectivity ” of the organism producing the disease, and on this assump- 
tion employed various methods of arriving, inductively, at the epidemic 
equation he had obtained empirically. 

With this concept guiding him he succeeded in deriving the normal 
frequency curve by assuming that successive generations of an epidemic 
would be proportional to a geometrical progression of two factors: one 
factor representing a geometric increase in cases from one generation 
to the next ; the other representing a geometric decrease in the organism’s 
infecting power from one generation to the next. In a later paper 
(Brownlee, 1911) the latter concept was expanded to a more elaborate 
and speculative hypothesis. 

The hypotheses which Brownlee finally developed were first presented 
(Brownlee, 1915a) in response to a paper in which Ross (1915) 
questioned Brownlee’s ideas on change in infectivity of the organism. 
Although Ross doubted that such a biological change in infecting power 
occurred, he conceded the possibility of a change in likelihood of infec- 
tion in the broad sense, as a result of various combinations of change 
in climate, environmental conditions, and composition and habits of 
human or vector populations. Brownlee granted that biological loss of 
infecting power of the organism could not be considered the sole cause 
of the course of an epidemic, but nevertheless, held it to be the most 
important cause. 

Some years later (Brownlee and Greenwood, 1926) in a recapitula- 


1 Dr. Philip Sartwell has observed that Brownlee was not always too discrimi- 
nating in his choice of epidemics, and included some, apparently common cause 
outbreaks, in which the epidemic curve represents the distribution of varying 
lengths of incubation periods (Sartwell, 1950) rather than the result of sucessive 
generations of spread through contact. 

* Wilson (1945) presented Brownlee’s derivation. 
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tion of the results of his studies, Pearson’s symmetrical Type LV curve 
is mentioned as a curve found by trial, apart from theory, to describe 
the :aajority of epidemics. In this publication Brownlee still defended 
the concept of change in infective power of the organism. However, 
here as also in his 1918 paper, he recognizes that the attempt to obtain 
positive confirmation of an epidemic theory by fit of a theoretical curve 
to observed data does not afford convincing support of the theory. 
Brownlee, and later Wilson (Wilson and Worcester, 1945a) found that 
quite different assumptions may lead to epidemic curves of a very 
similar shape. 

Empirical investigation of the properties of the epidemic curve was 
not pursued after Brownlee’s time, although Wilson (1925) used the 
percentage cumulative distribution of scarlet fever cases in a 1923-1924 
Providence, R. I., outbreak to compare a normal distribution curve grid 
with a logistic distribution curve grid. In this instance the cumulative 
ease curve followed the normal distribution quite closely. 


DETERMINISTIC EPIDEMIC THEORY 


Although Brownlee pursued his empirical investigations with under- 
standing of the many specific advances in epidemiology and bacteriology 
made since Farr’s day, he did not attempt their use a priori in develop- 
ment of his mathematical theory. Instead, from his observations on the 
nature of epidemic curves, he sought to deduce underlying biolological 
causes which would lead to equations of the form he had found to 
describe the course of recorded epidemics. Later workers, starting 
directly with accepted epidemiological relationships, have sought in 
different ways to construct mathematical models whose characteristics 
could be compared with field observations. 

Sir Ronald Ross used the latter procedure to work out a mathematical 
model of the epidemiology of malaria, and later used similar methods 
to develop a mathematical epidemic theory of greater generality. 

Ross stated that his concepts of epidemic theory were developed in 
the course of malaria investigations conducted in 1899 and the following 
decade. In 1911 in the second edition of his work The Prevention of 
Malaria, he presented, in the text, a difference equation derived from 
recognized factors in malaria epidemiology. These included the following 
characteristics of a stationary human population at a given time: (1) 
the proportion affected with malaria, (2) the proportion affected and 
also infective, and (3) the rate of recovery among the affected. For the 
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anopheline population Ross included (1) the number of local anophelines 
capable of carrying malaria, (2) the proportion of these which would 
succeed in biting an infected person, (3) the proportion of the latter 
which would succeed in maturing gametes, and (4) of these, the pro- 
portion which would succeed in infecting an uninfected person. 

Ross’ practical bent enabled him to get around the difficulties of 
obtaining exact quantitative estimates of the constants and to forge 
ahead by using arbitrary estimates which he thought were reasonable. 
By thus stepping over obstacles which might have deterred another, he 
arrived at a working mathematical model from which useful inferences 
could be drawn. 


Ross’ estimates (as modified by Lotka, 1923b) were 


8, proportion of anophelines, which having bitten infected 

humans, succeed in maturing parasites............++ 1/3 
i, proportion of malarious human population with ga- 


r, proportion of recoveries per month at instantaneous 
rate derived from Ross’ assumption that 50 per cent of 
infected persons would recover in three months (Lotka’s 
0.231 
a, the number (per person) of different mosquitoes, per 


month, capable of carrying malaria (arbitrary value 


Of these constants, s, i, and r represent inherent biological charac- 
teristics of the malaria plasmodium, anopheline mosquito and human 
hosts. The value of the constant, b, depends on biological characteristics 
of a particular species and environment and activities of a particular 
human population. 

Ross’ basic difference equation was 


Masi = Map + b*sai(1 — my) map —rmap (1) 


in which m, is the proportion of the population affected with malaria 
in period n, and p is the population of the area. Hence the number 
of cases in period n+ 1 is equal to the number of cases in period n, 
plus the new cases, b*sai(1— m,) map, less the recoveries, rm,p. It will 
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be noted that the expression for the number of new cases includes the 
factor b, twice, since the mosquito must bite both an infected person 
and then an uninfected person. 

In an appendix to his 1911 work, Ross presented a model employing 
a pair of differential equations, referring respectively to incidenc> of 
malaria in the human population, and in the mosquito population. 

Lotka (1923b) reviewed Ross’ work, studied his equations in detail 
and concluded that the single difference equation model included an 
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Fig. 2. MaLarta Equation or Ross (1911) 
Ordinates calculated from integral of Lotka’s (1923) differential equation: 


dm 
= b’saim(1 — m) —rm 


implicit assumption that malaria rates in the human and mosquito 
populations remained proportional. Expressing Ross’ model in terms of 
differentials, Lotka obtained a differential equation which leads readily to 
an integral equation of a type similar to the “ logistic ” used to describe 
population growth. The characteristics of this curve, for selected values 
of m are shown in Figure 2. On Ross’ assumptions the malaria rate in 
an area will reach an equilibrium value (determined by the constants, 
b, s, i, r, and a), which is independent of the initial malaria rate. 

In later papers (Ross, 1915, 1916, Ross and Hudson, 1917, a, b), 
Ross developed a more generalized mathematical formulation which was 
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adaptable to the description of a variety of situations. Because of its 
generality he coined the phrase “A Theory of Happenings,” and pointed 
out that his results would have applications in economics, sociology and 
other fields. His own developments were in the field of epidemiology and 
were derived from four basic differential equations: 


dP — (vdt)A + Vdt(Z) (2) 
dA = (v—h)dt-A+(N+r)dt-Z (3) 
dZ hdt-A + (V—N—r)dt-Z (4) 
PmA+Z (5) 


in which 
P = number of persons in population 
A= “ “ not affected 
“ “ affected 
h = “happening element ” 
r = reversion rate 


v, V = birth rate—mortality rate + immigration rate—emigration rate 
in the non-affected and affected populations, respectively. 


N = birth rate in affected population. 


When the assumption is made that h, the happening element, is 
constant, the solution of the differential equations for the case v = V 
is the simple exponential equation 


z= L— (6) 


in which z = Z/P, the proportion affected, and LZ and & are functions of 
h, N and r. This model is appropriate to describe the spread through 
a population of histoplasmosis and other diseases in which the individual 
is exposed to an environmental pathogen. 

In the case of contact infections Ross set the “ happening element ” h, 
equal to c-Z/P. In this relation, c is an arbitrary constant so that the 
“happening element ” is proportional to the number affected. In this 
case Ross’ curve for the proportion of affected individuals became the 
logistic 

L 


in which, & and L are functions of c, r, v, V, and N. 
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For a contact disease in which “complete immunity ” occurs, Ross’ 
result leads to exhaustion of susceptibles, a conclusion not in accord 
with current epidemiological thought. 

Ross’ equation for the new case rate was 


f = cx(1—2) (8) 


i.e., the rate of occurrence of new cases is jointly proportional to the 
proportion affected and the proportion not affected, at a given instant 
of time. 

Figure 3 represents the course of a disease such as histoplasmosis in 
which the chance of infection of an individual is not dependent on the 
number of infective individuals in the population. Figure 4 illustrates 
the situation in which an individual’s chance of becoming infected does 
depend on the number of infective individuals in the population. 

The generality of Ross’ initial equations has the advantage of per- 
mitting their use as a starting point for development of hypotheses 
depending on various assumptions with respect to effect of severity 
(fatality) of the disease, replenishment of susceptibles by births, and 
transmission by contact or by direct exposure to an environmental factor. 
For many purposes they offer a useful device for consideration of impor- 
tant factors determining the equilibrium conditions for a specific disease 
problem. Later investigators (Muench, 1934, and Turner, et al, 1950), 
have used similar methods to describe the development of immunization 
in a population. For a preliminary generalization, Ross’ basic differential 
equations provide the most convenient tool presently available. 

Recently G. Macdonald (1950a) with J. O. Irwin has extended Ross’ 
later theory (Ross, 1916) on “ independent happenings ” for application 
to a problem in malariology. Macdonald found that Ross’ equation (6) 
when fitted to data obtained in malaria surveys resulted in improbable 
estimates of h, the “happening element,” or inoculation rate. Review 
of Ross’ work in relation to the biology of human malaria led to a con- 
clusion that Ross’ assumptions did not allow for the effect of “ super- 
infection ”—the coexistence of independent broods of parasites in the 
same individuals, usually among young children in hyperendemic areas. 
On the hypothesis of superinfection Macdonald and Irwin developed a 
revision of Ross’ equation which was subsequently (Macdonald, 1950b) 
tested against field data. Irwin’s development of Ross’ theory led to 
the relationship R — r—h, in which R represents the observed recovery 
rate in the case of superinfection, r, the true recovery rate, and h, the 
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Equivoriate Cose 
Stetionary 
x* proportion of population affected 
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Fic. 3. Ross (1916) INDEPENDENT HAPPENINGS 
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Fie. 4. Ross (1916) INDEPENDENT HAPPENINGS 


= 


| 

| 

d | | 

© 9091 | 

| | 

f 
t | 

we) 

YEARS OF EXPOSURE 
d 
e | 
veV 

] 100 | 

0860 

: 
060 

if 
) 
020 

° 


156 ROBERT E. SERFLING 


inoculation rate. When specific mortality is negligible, the limiting 
value of the proportion of the population infected becomes h/r when 
h is less than r and unity when h exceeds r. The limiting value, under 
Ross’ assumptions, was h/(h +r). 

Considering r as a biological invariant, and estimating its value at 
0.005 (per day) from data collected by Earle et al (1939) in an aera 
of low endemicity in Puerto Rico, Macdonald estimated the value of 
h, the inoculation rate, from several series of observations made on 
children in hyperendemic areas. The estimated values appeared more 
reasonable than those obtained earlier with Ross’ model. Macdonald 
noted that the same method might be useful in analysis of tuberculosis 
infection in a population. 

An essentially similar approach was undertaken by W. O. Kermack 
and A. G. McKendrick, in a series of papers (Kermack and McKendrick, 
1927, 1932, 1933, 1937, 1939 and McKendrick, 1940) in which they 
developed a mathematical epidemic theory and applied it to results of 
experimental epidemics in animal populations. Initial development of 
their differential equations proceeded from more general assumptions 
than were used by Ross and in their later papers they modified their 
system of equations in order to explore the effects of temporary immunity 
and specific and non-specific death rates. 

Their differential equations did not have simple solutions, and 
although they arrived at approximate integral equations for comparison 
with survivorship curves in the experimental mouse epidemics, their 
epidemiological conclusions (McKendrick, 1940) were drawn from con- 
sideration of equilibrium conditions. From these they derived algebraic 
formulations of the relationship between the “steady state ” number of 
susceptibles, the infection rate, the recovery rate and the specific and 
non-specific death rates. For a disease conferring complete immunity 
their formulation was: 


Number of susceptibles greater than Cases increase 
equal to crite Cases constant 
(endemic state ) 
ae less than Cases decrease 
d = specific death rate | = recovery rate 
p = death rate from other causes k = infection rate. 


Thus, at equilibrium conditions, either an influx of susceptibles or an 
increase in the infection rate would favor occurrence of an outbreak, 
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the former by raising the number of susceptibles above the threshold 
(steady state) value; the latter, by lowering the threshold value. 
Crowding, for example, by increasing the frequency of contact could 
cause a rise in the infection rate. 

In the same year that John Brownlee published the results of his 
extensive examination of the statistical characteristics of epidemics and 
his conclusion that change in the “infective power ” of the infecting 
agent was of fundamental importance in determining the form of the 
epidemic curve, another paper (Hamer, 1906) presented the basic 
elements of an epidemic theory which in subsequent development has 
become the cornerstone of present concepts of the epidemiology of 
communicable disease. Hamer assumed that the number of new cases 
which develop from a given number of infectious cases would be pro- 
portional to: (1) the number of existing infectious cases, (2) the number 
of existing susceptibles, and (3) a constant depending on factors 
influencing contact of an infectious person with a susceptible. 

Using a very simple geometric approximation to obtain the necessary 
summations, Hamer developed a classic model of the epidemic curve of 
measles and the periodic recurrence of epidemics. 

Twenty-three years later, Soper (1929) presented a more elegant 
mathematical methodology but freely admitted that he considered his 
results a refinement rather than an addition to Hamer’s hypotheses. 
Soper built his model on the difference equation 


Z——-Z., (9) 


in which Z equals the number of cases at time ¢,; Z_, equals the number 
of cases one incubation period earlier at time ¢.,; 2 equals the number 
of susceptibles at time ¢,; and m is a constant measuring overall 
“contact rate,’ defined as the “steady state” number of susceptibles, 
a number of susceptibles such that one case at time ¢_, will generate one 
new case at time fy). Results of the Hamer-Soper model are shown 
graphically in Figure 5. Soper took the incubation period as his unit 
of time, assumed infection to be instantaneous at contact and that an 
infected person immediately becomes infectious at the end of the incu- 
bation period. 

In examination of a long series of measles cases reported in Glasgow, 
Soper noted that there was much more variability in the amplitude of 
the crests than would be expected from his model and for further 
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exploration studied a composite curve of the reported data for six 
biennia, 1901-1912. The nature of the composite curve led him to 
believe that an additional factor, seasonal variation in contact rate 
should be included in the model. His revision took the following form 


Zaps ) 


Cia * 
+1000 
5000 New coses 
$0000+ 
30000: 
SUSCEPTIBLES 


25 30 35 40 45 SO $5 60 65 70 
TIME IN INCUBATION PERIODS 


Fie. 5. Hamer-Soper Mope. or Eprpemic PERIODICITY 


in which the factor k is proportional to the contact rate at time 6, and 
A represents recruits to the susceptible population. After some pre- 
liminary work on the composite curve, Soper estimated average monthly 
values for ky from the Glasgow data for the period 1905 to 1916, during 
which the biennial case totals were fairly stable. The monthly values 
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found were at a low of 0.75 in July, increased to a high of 1.25 in 
October, slowly declined to 0.96 in May, and then fell abruptly to the 
July low of 0.75. These results seemed in accord with greater oppor- 
tunity for contact among children during the school year. Soper did 
not publish a calculated series for comparison with his observed data, 
but noted that even his revised model was too elementary to describe a 
particular series of epidemic waves. Among other factors he believed 
differences in contact rates within components of a community to be of 
importance. 

In 1928, Dr. Wade H. Frost (Wilson and Burke, 1942) presented a 
different development of Hamer’s hypotheses. Frost never published 
his results, and the present discussion is based on Wilson’s paper and 
on notes provided by Dr. Margaret Merrell. Frost’s special contribu- 
tion was recognition in his model of the possibility that a susceptible 
person might have one or more contacts with infectious persons, but 
through such multiple contacts only one new case would develop. In 
consequence, in each successive disease generation the number of new 
cases would be less than predicated by Hamer’s or Soper’s model. 

According to Wilson, Frost let pj 1/5; be the chance that any 
specified infectious person would have contact with any specified one 
of the S, susceptibles in the i-th disease generation. Then gq = 1—1/S, 
would be the chance that a particular susceptible would not have contact 
with a specified infectious person. If there were k; possible contacts of 
the susceptible with infectious persons the chance he would escape them 
all would be 

(1—1/5i)*, 
and the chance of one or more contacts with infectious persons would be 
1— (1—1/8;)*. 
If then, there were S; susceptibles in the i-th generation, the expected 
number of new cases in the (t+ 1)-th generation would be 
Cor — (1 — 1/84) (10) 
in which k; is assumed jointly proportional to C; and S;, the number 


of cases and susceptibles, respectively, in the i-th generation. Wilson 
observed that by using the relationship 


(1 1/8,)% 
(when S; is sufficiently large) Frost’s equation could be simplified to 


the form 
Cur = Si(1— er), (11) 
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For some purposes it has been found convenient to replace e* by a 
parameter, g, to obtain 


Cur = Si(1— gq“). (12) 
The number of susceptibles in the (1+ 1)-th generation will then be 
Sin = Si— Cir + A (13) 


in which A represents the number of recruits to the susceptible popu- 
lation during the i-th generation. 

With these two equations the course of a hypothetical epidemic can 
be calculated, generation by generation. 

It would advantageous to study the Soper and Reed-Frost models 
through use of integral, rather than difference equations and Wilson 
and Worcester (1944, a, b) investigated the possibilities. They showed 
that the derivative of the “logistic ” curve could be obtained as a first 
approximation to a differential equation employed by Soper, and also 
obtained an integral step function which could be used to approximate 
the Reed-Frost curve. For the latter they deduced the mean, variance 
and third moment. 

At a later date (1945, a,b,c,d) they returned to further study of 
Soper’s difference equation 

Ss 
Cin (14) 


making an empirical modification by setting 
Cus = (=) C;. 


From a study of the application of this form to Hedrich’s (1930) data 
and to data the authors collected on other epidemics they noted that p 
was usually greater than unity in the epidemics they studied. Since 
their modification of Soper’s model was an empirical one, the biological 
significance of their results is not clear. 

Dr. Helen Abbey (1951) undertook a study of the fit of the Reed- 
Frost model to a series of recorded epidemics in institutions, employing 


the form 
Cin = — 


Using maximum likelihood methods, q was estimated from data for all 
generations of each epidemic, and expected values were computed. It 
was found that a consistent discrepancy—too few cases in early genera- 
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tions, and too many in later ones—occurred. However, when both the 
initial number of susceptibles and g were estimated from the data, 
agreement of theory and observation was improved. The estimated 
numbers of susceptibles were found to be smaller than the recorded 
numbers, suggesting that incomplete medical histories may have resulted 
in inclusion of immunes among presumed susceptibles. 

For critical appraisal of epidemic models, an epidemiological Tyco 
Brahe who will collect data of unquestionable accuracy is needed. 


STOCHASTIC MODELS 


In the deterministic model it is assumed that with a given set of 
initial conditions only a single sequence of events will ensue. This is 
clearly an unrealistic hypothesis, since during each disease generation, 
a variety of factors may intervene to alter spread of the disease to 
potential victims of the next generation. If the assumption is made 
that the resultant effect of factors influencing the spread of disease 
from one generation to the next may be regarded as a random process, 
probability concepts (stochastic theory) may be included in the struc- 
ture of the model. 

In recent years a few ventures have been made into application of 
stochastic theory to earlier deterministic epidemic models. Bartlett 
(1949) examined a deterministic equation based on the Kermack and 
McKendrick models and observed that for large numbers of susceptibles 
the deterministic mean would approximate the curve of the stochastic 
mean. Bailey (1950) drew an analogy between the stochastic process 
and the fact that in a population the epidemic curve is the sum of a 
large number of epidemics in small groups of the population. Using 
the same deterministic model as Bartlett, he calculated the expected 
course of an epidemic according to the deterministic equation, and 
compared it with the curve of the stochastic mean of groups of size ten 
and twenty. The deterministic curve was symmetrical, whereas the 
stochastic mean curve was positively skewed—a result in accord with 
Brownlee’s observations. 

Abbey (1951) used a stochastic Reed-Frost model in analysis of epi- 
demics of measles in families. The calculated and observed results were 
not in agreement, possibly, according to Abbey, due to differences in 
within-family contact rates. 
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DISCUSSION AND SUMMARY 


William Farr’s recognition that the orderly rise and fall of the 
epidemic wave could be described mathematically preceded biological 
description of the causal mechanism, but his work, and the later heroic 
efforts of Brownlee to find in empirical equations a solution to the 
problem of epidemics did much to delineate the problem and stimulate 
thought of later workers. 

At the turn of the twentieth century Hamer defined basic quanti- 
tative relationships between the number of new cases and the number 
of susceptibles. Although Hamer’s concepts have been extended, they 
have not been fundamentally modified by later investigators. Soper’s 
excellent exposition and development of Hamer’s hypotheses, the parallel 
work of Frost and Reed, and Wilson’s related studies have resulted in a 
reasonably simple model of contagious disease epidemics with many 
theoretical applications. 

Proceeding in a different manner, Kermack and McKendrick 
developed a more general theory which also led to useful theoretical 
deductions. Had Kermack and McKendrick succeeded in obtaining 
integral equations for study by the epidemiologist it is likely that their 
results would have been more widely used. 

The most extensive practical applications of epidemic theory have 
stemmed from the work of Ross. Muench, and Turner and his coworkers 
have used equivalent concepts in studies of development of immunity in 
a population. However, the malariologists seem to have been most 
successful in direct applications. It is probable that trustworthy data 
on population parasitemia have enabled the latter to proceed further 
in applied epidemic theory than workers in other fields of epidemiology. 

With sound data the parameters of epidemic models could be given 
more thorough study. In models of contagious disease outbreaks it is 
certainly an oversimplification to represent frequency of contact by a 
single parameter which must measure the resultant effect of such diverse 
factors as different cultural and social patterns, varying environmental 
conditions, degrees of crowding, seasonal differences in exposure, and 
others. Work along the lines of Soper’s efforts to separate out the effect 
of seasonal differences in contact rate could, with adequate observations 
be extended to other factors. 

Problems of counting cases and susceptibles also need further work. 
Improvement in laboratory techniques, clinical diagnosis, and field 
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methods of collecting information afford means of obtaining data which 
will provide more rigorous and conclusive tests of theory. 

Current efforts to include the effect of chance variation in the mathe- 
matical formulation reflect recent progress in statistical theory. The 
efforts of Bartlett and Bailey to introduce these concepts into epidemic 
theory have been based on a very simple biological model (equivalent to 
the chemical “law of mass action,”) including only two parameters, the 
infection rate and initial number of susceptibles. The equivalent deter- 
ministic development leads to the “logistic” as the integral curve. 
Bailey recognized that this model was not realistic and that the epi- 
demiological problem was unduly simplified. However, this does not 
minimize the importance of their work in initial application to epidemic 
theory of a mathematical procedure through which consideration may 
be given to the effect of random variation on progression of the successive 
generations of an epidemic wave. 

During the last century mathematical epidemic models have reflected 
advances in biological theory. As scientific understanding broadened, 
re-examination of the quantitative model has sharpened epidemiological 
thought by requiring precise and accurate statement of the biological 
hypothesis. Inferences drawn from the model have contributed to 
formulation of new epidemiological hypotheses. However, advance in 
epidemic theory depends also upon tests of hypotheses and a crucial 
test must be based on complete and accurate data. In the past, these 
have been inadequate. 
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SOME MATHEMATICAL DEVELOPMENTS ON THE 
EPIDEMIC THEORY FORMULATED BY 
REED AND FROST * 


BY JOAQUIM DE OLIVEIRA COSTA MAIA 
University of Porto, Portugal 


INTRODUCTION 


R. LOWELL J. REED and the late Dr. Wade Hampton Frost 
jointly developed a mathematical theory of epidemics, which was 


later further expanded by Reed. None of their work has been published, 
but some of it has been utilized in the teaching of the departments of | 


Biostatistics and Epidemiology of The Johns Hopkins School of Hygiene 
and Public Health, especially in the course given jointly by these depart- 
ments (Biostatistics 9-Epidemiology 2). Of the work done by Reed 
and Frost, only that portion which is presented in this course is available 
to the writer; he is unacquainted with any further extensions which 
they may have developed. As presented, the theory applies only to simple 
situations as illustrated by an outbreak of measles in a closed group. 
It offers a reasonably good explanation of the course of such epidemics as 
well as of the mass behaviour of measles in the population of Baltimore. 

A brief summary will now be given of the Reed-Frost theory upon 
which this paper is based. The basic reasoning is as follows: 

In a closed population of size N, within which people intermingle 
fairly uniformly, it is plausible to assume that, in a certain period of 
time ¢, every individual will have with other individuals about the same 
number of contacts exceeding a given degree of intimacy. If the degree 
of intimacy be postulated to be sufficient for a patient with a certain 
contagious disease to transmit the disease to a susceptible person, this 
number of contacts, K, will be the average number of contacts adequate 
for transmissions of the disease (or, simply, adequate contacts) per 


* This paper has been excerpted by the editor from the dissertation presented 
by Dr. Maia for the degree of Doctor of Public Health at Johns Hopkins 
University. 
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individual per time ¢. If we make ¢ as large as the incubation period 
of the disease and each case is infectious for no longer than ¢, the 
individuals infected during one period will themselves be infectious 
during the next one. We have thus defined generations for our epidemics 
by a somewhat arbitrary delineation of the infectious period. 

If K is the average number of adequate contacts per individual per 
time ¢t, and the population size is N, then the probability of adequate 
contact between any two given individuals during time ¢ will be 


P= (1) 


and 
q=1—p (2) 


will be the probability of any given individual avoiding adequate contact 
with any other given individual during time ¢. 

Thus, if our population is at any time, ¢, composed of cases, C;, 
susceptibles, S;, and immunes, J;, the probability of any given individual 
avoiding contact with any of the cases will be q and, with all the C; 
cases, will be 


q" (3) 


and the probability of any given individual having at least one adequate 
contact with any of the cases will be 


P;=1—Q;=1—q". (4) 


In the transmission of disease we are interested only in the contacts 
between cases and susceptibles. Thus, in the next time period (¢ + 1) 
we will have 

Ons = (5) 


*It may be parenthetically noted that the simpler equation 
8S 
Cwm = KC 
is correct where C, = 1 but that for larger values of C, this equation in effect 
fails to allow for the fact that if multiple cases have contact with the same 
susceptible, only one case can arise in the next generation as a result of these 
contacts. Thus, the number of cases in time t + 1 will be overstated whenever 
the value of C, is larger in relation to S,. This sometimes leads to the absurd 
result that C,., > S,. 
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Assuming that permanent immunity is conferred by one attack of the 
disease, it will follow that: 


Star S:— Cts (6) 
and 
= 1p + Cy. (7) 


The following illustration (Table 1) will serve to show how a simple 
theoretical epidemic is constructed, assuming the introduction of one 
case in time ¢=—1 into a population composed of 100 susceptibles, 
p being .04. 


TABLE 1 


Theoretical epidemic following the introduction of 1 case into 100 susceptidles, 
where p = .04. 


No. of Total 
Time No.of  suscep- No.of  popula- 
interval cases tibles immunes __ tion 
t C I N qc 1— 
1 1 100 0 101 960 040 
2 4 96 1 101 849 151 
3 14 82 5 101 565 435 
4 36 46 19 101 .230 770 
5 35 55 101 -760 
6 8 3 90 101 721 279 
7 1 2 98 101 960 040 
8 0 2 99 101 


In many cases we have to consider susceptibles added or subtracted 
during each time period (by birth, death, migration, or otherwise). 
They may be represented as A; (either a positive or a negative value) 
and then (6) will become 


Siu = S; Cts + At. (8) 


The theory, as presented above, rests on certain assumptions that 
are debatable. 

The first one is that the infectivity of the organism is not altered 
during the course of the epidemic, i.e., p is constant. The theory that 
the infectivity of the parasite decreases with successive generations in 
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the host has been proposed and defended but not actually proved. The 
preponderance of evidence with respect to some infectious agents indi- 
cates that their infectivity does not measurably change in the course of 
an epidemic, while the possibility ef varying infectivity cannot be denied 
in the case of some others. 

The second is that immunization of susceptibles through inapparent 
infections does not take place during epidemics. In some diseases (e. g. 
measles) there is no convincing evidence that immunity is conferred 
except by a clinical attack, although Stocks and Karn (1) have suggested 
that there may be temporary “ latent immunization ” even in measles. In 
a number of other diseases, however, immunization through inapparent 
infection is known to occur frequently. In such cases the theory is at 
fault, and we will discuss later how this difficulty may be dealt with. 

In the application of the theory to observed epidemics, it is necessary 
to allow for the operation of chance. The number of contacts between 
cases and susceptibles, and hence the number of new cases generated in 
the next time interval, is always subject to chance variation. Such 
variation will indirectly influence the entire subsequent course of the 
epidemic. 

The explanation for the rise and decline of epidemics, usually 
suggested, falls into three categories: 


First: alterations of the parasite, as proposed by Farr and Brownlee; 
Second: alterations of the environment; 
Third: alterations of the host. 


These factors would be represented in our formula (5) exclusively 
by the probability g. Under this concept, g will vary according to the 
intimacy of contact within the population group, according to the 
relative ease of transmission of the infectious agent from person to 
person, and according to the receptivity of the individuals to the same 
agent. Within the same social group q would presumably be less for 
influenza than for tuberculosis. 

Although q is considered a constant as applied to a particular 
epidemic, the theory by no means postulates that q, or p, is constant 
for a certain disease but in limited time and space, i.e., in the same 
community it may change from one occasion to another, and it may be 
different in different communities at the same time. Nevertheless, the 
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constancy of g means that the aforesaid factors are regarded as constant 
during each outbreak. 


Consequently, our theory assumes that the rise and fall of epidemics, 
at least when evolving in a short period of time, will be dependent upon 
the numbers of susceptibles available and their depletion, through infec- 
tion and acquired immunity, to subliminal level or complete exhaustion. 

This theory, in the form presented, has been applied to observed 
epidemics of certain contagious diseases in small, more or less isolated 
human population groups (2). 

The theory in its general form is limited in its applicability to con- 
tagious diseases and cannot cope with all the situations which exist. 
It cannot explain: 


Diseases with multiple hosts, such as insect vectors and animal 
reservoirs, where the situation is too complex for measurement of all 
the factors ; 


Diseases with a sizable proportion of inapparent infections ; 


Diseases with a period of infectivity which is variable or does not 
coincide with the clinical illness; 


Diseases in which the mechanism of acquisition of immunity is more 
complex than in measles, where solid permanent immunity is conferred 
by, and only by, a recognizable attack. 


It is, then, the purpose in this dissertation to consider the applica- 
tion of epidemic theory to more complex situations and make the 
necessary modifications of the basic formulae to adapt them to different 
conditions. Of the possible circumstances we will deal with the following 
ones: 


a) The occurrence of inapparent infections or uncounted cases; 
their influence on the course of the epidemic and the values of p as 
calculated from the observed data. 


b) The variability of the period of infectiousness from disease to 
disease and the variability of the incubation period of a disease; their 
influence on the daily course of the epidemic; comparison of values of 
p computed on a daily basis and on a generation basis. 


c) The variability of acquired immunity from disease to disease ; 
consideration of varying duration of immunity and its effect on the 
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pattern of incidence of the disease (explosive outbreaks, endemic level 
with recurrent epidemic waves, stable endemic level). 


d) Heterogeneity of populations as regards contact rates; con- 
sideration of its effect on the values of p computed from observed cases 
and susceptibles in the entire population. 


I. EFFECT OF INAPPARENT INFECTIONS AND UNDERCOUNTING OF CASES 


Most infectious diseases have a wide range of clinical severity 
extending all the way from an inapparent (subclinical) infection to a 
fatal illness. In some diseases inapparent infections are believed to 
outnumber clinical attacks by a considerable margin. For example, in 
poliomyelitis inapparent or at least unidentifiable infections have been 
roughly estimated to constitute all but 1 per cent of total infections; 
in diphtheria, Frost (3) has suggested that the ratio of immunizing 
infections to clinical cases may be 5 or 6 to 1, or even higher, depending 
on age, environment and other factors. On the other hand, in measles 
probably no infections are wholly inapparent. Some may, however, be 
relatively mild and hence escape detection, while others may fail of 
recognition due to errors in diagnosis or the lack of medical attention, 
even in the school epidemics from which much of our observational 
material is drawn. 


In studying the effect of inapparent infections or uncounted cases 
on the epidemic curve, two assumptions will be made: first, that such 
cases have the same capacity to infect susceptibles as do recognized cases ; 
and second, that the proportion of such cases is constant throughout an 
epidemic. The first assumption may be incorrect, but the possible lesser 
infectiousness of mild cases must be weighed against the fact that the 
activities of such cases are not restricted by the illness. Under these 
circumstances, assuming that the original count of susceptibles (S,) is 
correct, if z be the proportion of total cases that can be recognized, an 
epidemic will be represented as shown in Table 2. 


Therefore: 


Cn (9) 
and 


Sta = S:— Cra. (6) 


9) 


6) 
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The coefficient x will vary between 1 and 0. When it is equal to 1 
all cases are recognizable and we come back to the original theory. 
When it reaches 0 no cases are recognizable and the epidemic evolves 
undetected. If x becomes very small our formulae will not hold for 
two reasons, namely (a) S, as counted will be very unlikely to be correct, 
and (b) the first generations may have too few cases to make it probable 


TABLE 2 


Epidemic having a constant proportion x of uncounted cases 


“ Counted ” “ Counted ” 


Generation cases susceptibles Total Total 

t C 8 cases susceptibles 

1 C; 8, 8, 

© 

3 C; S,;=8,—(C; Os g,—22+% 


that some recognizable cases will appear and thus the epidemic may be 
under way for some time before it can be noticed. 

Two extreme situations may be considered in studying the effect of 
inapparent infections on the epidemic curve, namely (a) that clinically 
recognizable and recognized cases constitute a very large proportion of 
the total infections, and (b) that they constitute a very small proportion. 

In studying the first situation we were concerned with the effect of 
the inapparent infections on an analysis of an observed outbreak 
according to the epidemic theory. 

We computed four epidemics, all of them starting with one case and 
100 susceptibles. The values of p were .1, .04, .03, and .02. We deleted 
10 per cent of the cases in each generation and recalculated the number 
of susceptibles observed to remain in each generation. From these 
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“counted ” numbers of cases and susceptibles in each generation we 
computed “observed” values of p. This “observed p” would be, for 
each generation, the probability of adequate contact between any two 
members of the population under study, as we would be able to compute 
it from the data available in the circumstances described. The results 
obtained are shown in Table 3. 

The values of the “observed p” are smaller than the real ones, 
except in the first generation of some of the epidemics before the 
number of cases in each generation is large enough for inapparent cases 
to occur. The variation in our examples may be desc ibed as follows: 
There is an early decline in the “observed p” while the number of 
cases is increasing slowly; a slight rise and stabilization while the 
epidemic curve is steeply sloped upward; and a second sharp drop of 
the “observed p” as the epidemic reaches its peak and declines. 


It seems, thus, that in an epidemic an analysis of which shows a 
decrease of p in successive generations the variation in the observed p 
might not indicate an overcounting of susceptibles at the beginning but 
rather an undercounting of cases during the course of the outbreak. 
This undercounting might be due either to the infections being in- 
apparent or to incompleteness of reporting, or to some other reason 
leading to a rather regular deletion of cases throughout the epidemic. 

In studying the second situation, where the clinically recognizable 
cases are a very small proportion of the total infections, we are con- 
cerned with the shape of the resulting curve, since it would be prac- 
tically impossible to make any count of susceptibles or total cases in 
such circumstances. 

We computed three epidemics, all of them starting with one case 
and 10,000 susceptibles but having values of p of .001, .0004, and .0003 
(respectively, epidemics No. 5, No. 6 and No. 7). We deleted 99 per 
cent of the cases in each generation and took the remainder as our 
“counted ” cases. 

We did not recompute values of p from “counted ” cases and sus- 
ceptibles because in such a situation counts of susceptibles are not easy 
to obtain and are highly unreliable. Instead, we tried to find values of 
p and S, that would fit a theoretical epidemic curve to our curve of 
“counted ” cases. It is easy to guess at approximate values and trial 
and error then permits obtaining a good enough fit, if such fit is possible. 


The curves of “ counted ” cases were found to fit very satisfactorily 
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epidemics in which all cases were recorded, calculated with a slightly 
smaller K (considering N = S, + C,) but populations 100 times smaller. 
These “ comparable ” epidemics are fitted almost as well as it is possible 
to fit them. The number of susceptibles cannot be much larger than 
100, since the epidemics No. 5, No. 6 and No. 7 begin their decline 
after a total of 64, 78 and 53 cases, respectively. The comparability is 
shown in Tables 4, 5 and 6. 


TABLE 4 


Epidemic No. 5, where 8S, = 10,000; p = .001, and 1 per cent of cases are apparent 


TOTAL CASES* IN COMPARABLE 


TOTAL COUNTED EPIDEMIC WHERE 
t INFECTIONS CASES* 8, = 100; p = .1 
1 1 
2 10 
3 99 1 1 
4 933 9 10 
5 5 436 54 59 
6 3 508 35 31 
7 15 
8 0 0 0 
*— = no counted cases in the presence of inapparent cases; 0 = no apparent 
or inapparent cases. - 


Thus in the described conditions, an epidemic might go on for 
several generations before the first cases could be recognized and might 
appear to end several generations before it actually did end. The shape 
of the epidemic curve, as observed, would be similar to the shape of 
another theoretical epidemic with many fewer susceptibles and a much 
higher p. However, the “counted ” cases would be sparsely distributed 
through a very large population that would have to be considered 
involved in the epidemic. Thus, if we were to assume that the curve 
fitted to the apparent cases was the true one and calculated K for the 
population, it would be absurdly high. Taking N as 10,001 (the mini- 
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Epidemic No. 6, where S,= 10,000; p=.0004; and 1 per cent of cases are apparent 
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TABLE 5 
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TABLE 6 


Epidemic No. 7, where S,= 10,000; p=.0003; and 1 per cent of cases are apparent 


TOTAL CASES IN COMPARABLE 


TOTAL COUNTED EPIDEMIC WHERE 

t INFECTIONS CASES S, = 100; p = .028 
1 1 
2 3 
3 9 
4 27 
5 80 1 1 
6 234 2 3 
7 645 6 8 
8 1 584 16 18 
9 2 806 28 28 
10 2 625 26 24 
11 1 083 1l 9 
12 251 3 2 
13 47 
14 7 _ _ 
15 1 
16 0 0 0 
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mum possible in our calculated epidemics since S, = 10,000), the values 
of K, computed from p in the fitted curves would be: 


EPIDEMIC NO. K (FROM FITTED p) REAL K 
5 1 000 10 
6 340 4 
7 280 3 


Furthermore, we would be assuming that most cases were recognizable 
and only a small percentage of the population was susceptible which 
could not be justified by the past history of the disease in the community. 

For a number of human and animal diseases, such as infection with 
C. diphtheriae, N. meningitidis, and poliomyelitis, we have good evidence 
that inapparent infections greatly outnumber clinical cases, although we 
do not know how the two categories of cases compare for infectivity. 


II. LENGTH OF INFECTIOUSNESS AND OF INCUBATION PERIOD 


Most epidemics recorded day by day in small closed communities, 
such as schools, present a fairly constant pattern in the distribution of 
cases. The epidemic generations can be well determined because the 
new cases are clustered in a few central days of each period. As the 
epidemic progresses these clusters have more and more cases and cover 
more and more days. When the epidemic begins to subside the number 
of cases per generation decreases, but the number of days during which 
new cases appear in each period will decrease much more slowly. 

This fact suggests that epidemics may be calculated on a daily basis 
if we assume a value of p for a day (p’; and q’ —1—p’) instead of 
for a whole generation, and allow for a certain variability of the incu- 
bation period. This variability of the incubation period has to be defined 
according to a probability distribution. Sartwell (4), having reviewed 
a series of epidemics of infectious diseases reported in the literature in 
which the length of the incubation period could be studied, verified 
that its distribution is usually a logarithmic normal curve, the standard 
deviation of which is fairly constant and independent of the mean and 
of the disease considered. The anti-logarithm of this standard deviation, 
which the author called dispersion factor, fell between 1.2 and 1.5. Since 
these values include besides the dispersion factor of the incubation 
periods, the variability of accuracy of reporting and chance variation, 
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we used a dispersion factor of 1.1 for our computations and 13 days as 
the geometric mean of the incubation periods. With these two values 
we calculated the probabilities of the distribution of cases infected in one 
day coming down with the disease for daily periods. Since the numbers 
of infections are small and the daily new cases have to be expressed by 
integers, the approximations are rather rough. 

Four epidemics were calculated starting with 1 case and 100 sus- 
ceptibles. In epidemic A we considered each case to be infectious only 
during the day of onset and in epidemic B, during the day of onset and 
the following day; both have a daily p (p’) of .13. In epidemic C we 
considered infectiousness during the day of onset and a daily p (p’) of 
.04; in epidemic D infectiousness during the day of onset and the 
following day, and daily p (p’) of .04. (See Fig. 1). 

These epidemics were calculated on a daily basis; i.e., for each day 
infectious cases and susceptibles were counted and the number of infec- 
tions computed. The new infections were deleted from the number of 
susceptibles. As soon as cases stopped being infectious they were included 
among the immunes. 

We may express the daily probability of a person infected within a 
given day coming down with the disease m+ r days later as mr} 
m being the geometric mean of the incubation periods and r a positive 
or negative integer indicating how many days before or after the mean 
we consider the probability. 


( Tne = 1). 
If we consider cases infectious for 1 day (epidemic No. A and No. C), 
the number of cases coming down on any day d will be 
—m+1 


If we consider cases infectious for z days (in epidemics No. B and 
No. D, z = 2), we will have on day d new cases NC, and infectious cases 
IC4. The infectious cases will be 

a 
1Ca=NCat+ t+: 2 NO, (11) 
and the new cases will be 


(1 — . (12) 
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Epmwemic A 
Dont tase 
dh 
fL 
Epmwemic B 
Dowt 
ith 
C 
tart 
om 
1 10 20 30 30 60 70 60 DAYS 
Epipemic D 
Dowt tase 
mm 
fh +4 
10 270 “0 60 70 DAYS 
Fic. 1. Epmemics HAVING INDEX OF DISPERSION OF LENGTH OF INCUBATION 
PERIOD = 1.1. 


Epidemic A: daily p = .13, period of infectiousness = 1 day. 
Epidemic B: daily p = .13, period of infectiousness = 2 days. 
Epidemic C: daily p = .04, period of infectiousness = 1 day. 
Epidemic D: daily p = .04, period of infectiousness = 2 days. 
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After our epidemics were computed day by day, we grouped the cases 


by generations and then recalculated p on the basis of generations 


(Table 7). 
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Let us call the new cases occurring on each day of generation 
t:a,b,c,:--l,m,n. If the cases remain infectious for z days, the 
number of infectious cases and the probability of avoiding infection in 
each successive day are as shown in Table 8. 


TABLE 8 


Theoretical probabilities of avoiding contact 


PROBABILITY OF AVOIDING 


DAY INFECTIOUS CASES INFECTION 
1 a 
2 a+b q’s* 
3 a+b+e 
l+m+n 
m n 
n q’" 


Since infectiousness lasts for z days, each number of daily new cases 
figures z times in the daily numbers of infectious cases. The probability 
(Q:) of any of the susceptibles at the beginning of the period avoiding 
infection all through it is the product of the daily probabilities, that is: 


q/serme. -. lemen) 


Thus, if we have S; susceptibles at the beginning of generation ¢, 
the number of people infected during the same generation, or of cases 
in generation ¢ + 1, will be: 


Cir = (1 — = gq"), 


where q is the probability of escaping adequate contact per generation 
and is 
q*. 


We see thus that the actual distribution of cases by day is not 
theoretically important for the occurrence of cases. 

However, when we do calculations, using a distribution like ours, 
the number of cases in the first few and last days of each generation is 
very small. After the susceptibles have been depleted through a few 
epidemic periods, those numbers of cases may not be enough to induce 
new cases. Thus, our recomputed p on a generation basis may appear 
to decrease as the epidemic goes on. Furthermore, as all numbers of 
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cases have to be approximated to integers and not very many days in 
each period have cases (in our calculations), the approximations do not 
always compensate each other, thus giving p some variability. 

The above considerations give a complete explanation of the results 
presented in the table. 


III. EFFECT OF VARIATION IN THE DURATION OF IMMUNITY 


The basic epidemic theory postulates that once infected an individual 
cannot be reinfected, i. e., that immunity is lasting. While this is true 
of measles and most of the other diseases to which efforts have been 
made to apply the theory, it is not true of some other infections. It 
seems worth while, therefore, to modify the equations so as to provide 
for reintroduction into the susceptible population of individuals who 
have lost their immunity, and then to determine to what sorts of epi- 
demic patterns this will lead. In doing this, we are of ncessity assigning 
an arbitrary and fixed value to the duration of immunity. In reality, 
the duration of immunity depends upon the individual’s capacity to 
develop antibodies, the kind and severity of infection he has sustained, 
and other factors. Furthermore, acquired immunity is lost gradually 
rather than abruptly, and may probably be reinforced through exposure 
while still immune (in the case of diphtheria there is good evidence 
that this is true). 

If we consider a disease an attack of which confers only a temporary 
immunity, lasting for n generations, the basic formulae of the epidemic 
theory will become: 


Crs = 


Sta = Cra + + At (13) 
t 
= (Cr) (14) 


assuming a case fatality rate equal to zero. 

In the special case of n = 0, i. e., for a disease in which no immunity 
is conferred by an attack, and A 0, i.e., when the epidemic takes 
place in a period so short that incoming susceptibles do not need 
to be taken into consideration, formulae (13) and (14) will become, 
respectively : 

Sta N 
and 
I 0. 


Solid immunity will be represented by no. 
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When A = 0 and n is larger than the number of generations needed | 
for the epidemic to evolve through the population and come to an end 
because the susceptibles have reached a subliminal level, the fact that 
the immunity is only temporary has no effect on the epidemic curve. 

The same may be true when a few individuals revert to susceptibility in 
the last generations of the epidemic but not in numbers large enough 
to raise the susceptibles above the threshold level. 

In studying the effects of temporary immunity two series of situations 
were considered. In the first series: 


C; i, 8; = 100, 
Pi = -1, .04, .03, .02, .016 and .005, 


n, = 0, 1,3 and 5 time periods ¢. 
In the second: 
C, = 1, S, = 200 


and p and n have the same values as in the first series. Thus, a total 

of 56 epidemics were computed (28 in each series). In all of them it 

was assumed that no new susceptibles were added (i.e., A=0). The 

epidemics are presented in Figures 2a-2c. 
The epidemics are presented in Figs. 2a-2e. 
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Fic. 2(b). Epmpemics with N = 101, Duration oF IMMUNITY = 3 GENERATIONS, 
SELECTED ConTAcT RATEs. 


IMMUNITY - ST 
40 04 1.0) + 0% 
30 \ 
20 ‘ 
40 a 
30 
20 
10 


Fic. 2(c). Epmpemics with N = 101, Duration oF IMMUNITY = 5 GENERATIONS, 
SELECTED CONTACT RATES. 


ed | 
nd 
at 
ye, 
in 
ns 
al | | 
it 
016 
| 
| 


186 JOAQUIM DE OLIVEIRA COSTA MAIA 


INMUNITY = IT 
N-201 
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Fig. 2(d). Epmemics with N = 201, DURATION oF IMMUNITY = 1 GENERATION, 
SELECTED CONTACT RATES. 
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IMMUNITY: 31 
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Fie. 2(e). Epmpemics witH N = 201, DuraTIon oF IMMUNITY = 3 GENERATIONS, 
SELECTED CoNTACT RATES. 


In Table 9 columns (c), (g) and (j) express length of time in 
epidemic generations. 

The words employed to express the results in the various conditions 
are used with the following meaning: 


“ Explosive ”—an outbreak runs through the population to the com- 
plete disappearance of the disease. 


“ Endemic level ”—a stable level of incidence is obtained, whether 
or not it is preceded by damping waves. 


“Cycle ”—a theoretical stable cycle of incidence is obtained, gen- 
erally after several damping waves. 


The results obtained in our examples may be summarized as follows: 


Equilibrium is reached more and more promptly as smaller values 
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of p are taken and for equal values of p equilibrium is reached more 
rapidly where n =0 than where n—1. Epidemics starting with 200 
susceptibles require longer to reach equilibrium than those with 100 
susceptibles, n and p being equal. 


For n = 3 and n = 5 a few epidemics show well-marked cycles. The 
epidemic pattern (i. e., whether there is a single explosive epidemic, a 
series of cycles, or an endemic state) appears to be related to a ratio 
shown in columns (h) and (k). The numerator of these ratios is the 
duration in generations of an epidemic, immunity being permanent; 
and the denominator, the number of generations of immunity in a given 
series, calculated from the same values of S, and p. Thus, the numerator 
in each instance is the value in column (d), and the denominator is the 
value of n (3 or 5, respectively). When these ratios were smaller than 
1.7 we had explosive epidemics; when they varied from 1.8 to 3.0 we 
had cyclic equilibrium; when they were greater than 3.6 we had an 
endemic level. In our examples no values between 3.0 and 3.6 occurred. 

An example of a situation that can reasonably be explained by 
temporary immunity being too long for the size of the community and 
its p is the epidemiological pattern of common colds in Spitsbergen 
described by Paul and Freese (5). Their observations were made in 
Longyear City (population of about 500 people) from October 1930 to 
August 1931. The harbor is blocked by ice during the winter and all 
communications are then interrupted. The shipping season lasts 3 to 5 
months. When it ends, the incidence of “colds” decreases until it 
reaches zero. As soon as the harbor is open, about 48 hours after the 
first ship comes in the first cases begin to appear and the incidence 
reaches very high rates (“About 75 per cent of the winter residents 
had suffered an attack of ‘common cold’ by the end of the first month ”). 
There is a great turnover of the population during summer and, after 
the first, explosive epidemic, “colds ” will occur in small numbers until 
the next winter season, when again they tend to disappear. According 
to the authors, this pattern is confirmed by local history and records 
and seems quite characteristic of isolated Arctic communities. This 
pattern may be explained by assuming an immunity lasting for a few 
weeks or months following an attack of the common cold. Thus, after 
the harbor is closed and the community isolated, the disease would die 
out for lack of susceptibles ; but by the arrival of the first boat, most of 
the population would have become susceptible again and a large epidemic 
would result from the introduction of new infections. The peculiarity 
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of the epidemic behavior of colds in such communities, according to 
our hypothesis, is dependent on two factors: their isolation during a 
considerable part of the year, and their small population, which results 
in the number of susceptibles remaining after an epidemic being too 
small to support the disease. 

The length of the cycles obtained with n= 3 was, in all cases, 7 
generations; with n = 5, it was 11 generations. Although the evidence 
is not extensive enough to make precise inferences, we may say that the 
cycles, when obtained, will be larger than 2n. 

The above results show that, in a large population, with a small value 
of p and a large value of n, we might theoretically expect a long cycle 
with a wide range between maximum and minimum, in the absence 
of any variation of p due to changes in the parasite’s infectivity or 
transmissibility, population movements, etc. 


IV. APPLICATION OF EPIDEMIC THEORY TO STRATIFIED POPULATIONS 
In the basic formula of the epidemic theory 
Crs = — (5) 


it is assumed that we may consider a probability of adequate contact to 
exist between any two persons of the population, and that this prob- 
ability varies within limits narrow enough so that one single average 
value of p may be applied to the entire population. 

This is probably true in certain cases, such as schools or institutions 
of special kinds, but can hardly be expected to hold in aggregations of 
populations like cities where contact among people is highly facilitated 
by, for instance, occupational or residential grouping and made very 
difficult when such factors work against it. 

Thus, it seems logical to consider a population formed of several 
strata within each of which there is a certain p and among which there 
is a different and smaller probability of adequate contact. Since these 
strata will represent groupings according to social factors, a different 
value of p may be assigned to each one of them. 

We will identify each stratum by a different number (1, 2,3,- - -) 
and thus we have in generation 1 of our epidemic: 


C,, and S,, in stratum 1; C., and S2, in stratum 2, ete. 


Let Pos Pn be the probabilities of adequate contact in the 


= 
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strata 1,2,3,---,m, respectively, and ~ be the probability of ade- 
quate contact between any specified individual in one stratum and any 
specified individual in another (different) stratum. Cases in each 
stratum (stratum x) in the time period ¢-+ 1 will be given by the 
formula: 


Coa &,, 1 x qi" . (15) 
The total number of cases in the same period (C;,,) will be: 
> [Cn] Cz, ) 
Ch = = = [S..\1— X (16) 


or 
Ch = Cia + Cora Crees 


In actually observing an epidemic in any population that might be 
considered stratified we can ordinarily only count our total number of 
cases and, possibly, susceptibles, in each generation; not cases and 
susceptibles in each different stratum. This only allows us an estimation 
of an average value of p or q by assuming a situation for the total popu- 
lation expressed by a formula similar to (5): 


> (0.1) 
[Ca] [Sz,] qi? (18) 


We write here gq; meaning the average value of q operating during 
period ¢, as the average might vary from one period to another. 

In this formula, however, qg; does not refer any longer to contact 
between any two members of the population, as q did in (5). Now we 
have considered contact between susceptibles and cases. Thus, we shall 
define pp —1—gq; as the apparent average probability of adequate 
contact between any given susceptible and any given case, during time 
period 

From (18): 


[C2,] [S.,] [C24] 
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By substituting in (19) from (16), and making proper algebraic 
operations : 


2 


In formula (20) the fraction under the radical is necessarily smaller 
than 1, since gz is smaller than g; (it does not make sense to consider 
a stratum in which the individuals have less probability of adequate 
contact with other members of the same stratum than with members of 
the other strata). But while the value of this fraction decreases inversely 
with the average number of cases in each stratum, the root increases as 
the total of cases increases. Thus the value of g; may be expected to 
increase when 


> Ce, 
@=1 


increases. But this variation will depend, too, upon the distribution of 
cases and susceptibles among the different strata, q; decreasing when 
higher proportions of the susceptibles and cases belong to the strata 
with smaller values of gz. This variation of p; during the evolution 
of the epidemic will depend, too, on the values q; and the different q.’s. 
Thus, when q; increases, approaching the values of q., the fraction q2/q 
will tend to one, and so will the values of the fraction under the radical 
and of the root, q; varying little and tending to qi. When q; equals q, 
and all q.’s are equal, no stratification exists. Therefore, the smaller q; 
in relation to gz, the more widely q; will vary and the lower its level. 

It must be borne in mind that p; calculated in this way is the average 
probability of adequate contact between any given susceptible and any 
given case during a generation as the numbers and proportions of cases 
and susceptibles in each stratum vary. 

The average p for the whole population may remain constant if the 
number of individuals in each stratum and the values of g, and q; 
remain the same while g; changes because the portion of the population 
paired for its computation gives, in each generation, a different set of 
mutual probabilities to be averaged. 

Actually, to study the variation of p; it is necessary to calculate 
epidemics set up in a stratified way. Two such epidemics were calcu- 
lated (See Fig. 3) in each one of which three strata were considered. 
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The first one started from the following situation: 


C1, = 0, S,, = 200, qi = 0.997, 

C2, = 0, S2, = 500, q2 = 0.995, 

C,,=1, = 1,000, qs = 0.992, 
qi = 0.9999, 


that is, the first stratum has 200 susceptibles and a g of 0.997, the 
second one has 500 susceptibles and a q of 0.995, the third has 1,000 
susceptibles and a q of 0.992, the inter-strata q; is 0.9999 and one case 
is introduced into the third stratum, during the first time period, to 
start the epidemic. 

The second epidemic calculated starts from the same situation but 
a certain number of new susceptibles is introduced into each stratum 
at each time period: six into the first, fifteen into the second and thirty 
into the third. 

Cases were calculated for each stratum at each time period by the 
use of formula (15). From the calculated numbers of cases and 
susceptibles g; was determined by use of the formula (19). Seven- 
figure logarithms were used in the calculations and cases were approxi- 
mated to the nearest integer. 

The first epidemic evolves in twelve generations, ending because the 
remaining susceptibles have become too few for the two cases in gen- 
eration 12 to give origin to a new case. 


Epidemic No. 1 


Generation 12 


STRATUM REMAINING SUSCEPTIBLES 


1 148 
2 56 
3 0 
Total 204 


The second epidemic attains an equilibrium in the sixty-fourth 
generation when the number of new cases in each stratum equals the 
number of incoming susceptibles. The stabilized situation is the 
following : 
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Epidemic No. 2 


Generation 64 


REMAINING 
STRATUM CASES SUSCEPTIBLES 


1 6 285 
2 15 192 
3 30 137 
Total 51 614 


The results are illustrated in the graphs, Fig. 3. Both graphs show 
that p; tends to vary inversely with the total number of cases. The 
waves in the two curves do not, however, follow a symmetric pattern. 
The turning points in the curves of p; occur one or two generations 
later than the ones in the curves of cases. This is due to shifting in 
the proportions of cases in each stratum, the movements upwards or 
downwards being initiated in the stratum with higher p; (stratum 3). 

In computing p; from actual situations we have to remember that it 
does not have the theoretical consistency of the p in the original theory 
and is really a different entity. But it will be, in most of the instances, 
the only possible approach to an estimation of the real p’s operating in 
our population. 

The fact that the two concepts of p are different should not be held 
too important. The p in the original theory depends upon the disease 
considered but is not characteristic of it. It is possible that different 
strains of the same agent might have different infectiousness. Further- 
more, it is quite probable that the same strain, with exactly the same 
potentialities, will spread more or less easily in different environmental 
conditions; the size of the community, the composition and habits of 
the population are undoubtedly very important factors in the deter- 
mination of p. 

The larger our community, the more important the effects of strati- 
fication will be in the spread of contagious disease. This will make p; 
more variable and the consideration, even for short periods, of only one 
stable value of p less tenable. However, the larger the community, the 
more difficult it will be to define strata and to count cases and susceptibles 
in each one of them. In such instances p; will probably be our best 
measure of the transmission of disease. 

From this theoretical speculation we may conclude that, if a complex 
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No. 1. 


—— Cases 


Epiwemic No. 2 


6 
—— Cases 


Fic. 3. Eprpemics IN STRATIFIED POPULATIONS, WITH CALCULATED CONTACT 
RATES. 
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population like that of any town or community may be considered as 
stratified from the point of view of probabilities of adequate contact 
between any two of its members, we should expect both during sporadic 
outbreaks (comparable with our epidemic No. 1) and in endemic situa- 
tions with fluctuating levels of cases (comparable with our epidemic 
No. 2) that the average probability of adequate contact between any 
given susceptibles and any given case during an epidemic generation 
would vary inversely with the number of cases observed in the same 
generation. 

Evidence to this purpose drawn from actual observations would only 
support our hypothesis without proving it. Only accumulation of such 
observations where strata could be defined and their cases and suscep- 
tibles counted and found to follow, in detail, the theoretical calculations 
within chance variation could be considered as a proof. 

In the case of diseases with seasonal variation of incidence, the more 
commonly suggested explanation of such cycles, in the light of the 
epidemic theory, assumes that, either because of the climatic conditions 
that facilitate the transmission of disease at certain times of the year, 
or because of social rearrangements (like the opening and closing of 
schools, the movements due to vacations, the drafting of young men for 
compulsory military service in certain countries, etc.) or of both, there 
is an induced increase of p and thus the seasonal epidemics are started. 

If we accept an increase of p due to climatic factors at certain 
seasons, we will have to expect g, and q in (20) to diminish in about 


the same proportion. Thus the fraction ~ would keep a constant value 
and so would the fraction under the radical and its root, when cases, 
susceptibles and their distribution are considered invariable. Therefore, 
ge would vary proportionally to qi; or p; would increase in the same 
proportion as the individual p,’s and p; The value of p; would still 
vary inversely with the number of cases, but at a different level. 

But considering the populations as stratified we would expect both 
such variation of p; and some cycle in the incidence. The length of 
this natural cycle would depend on the local conditions but it is logical 
to expect that such social movements as referred to above, all of which 
are seasonal and have the effect of rearranging our strata, will force it 
into an annual cycle. We might then be able to explain seasonal 
variation of diseases without invoking the doubtful factor of increased 
pathogenicity or viability of the parasite at certain times of the year 
due to favorable climatic conditions. 
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DISCUSSION 


The epidemic theory proposed by Reed and Frost has proved to be 
of great didactic value in the combined course of Epidemiology 2 and 
Biostatistics 9 of the Johns Hopkins School of Hygiene and Public 
Health. It furnishes a means of systematization of epidemiological ideas 
and brings forth the relative importance of the different factors involved 
in the evolution of an outbreak. 

However, the theory has not been applied to practical fields, where it 
might be extremely useful, as a means of both accurate description of 
epidemic phenomena and interpretation of epidemiological mechanisms. 
Without such application we lack proof of the validity of the theory. 
The main difficulties encountered are due to the narrow range of situa- 
tions which the theory, as presented, will cover. We have pointed out 
that the theory fits many small epidemics occurring in a set of restricted 
circumstances. The accumulation of such evidence, however, is not 
unanswerable proof that the mechanism by which epidemics of com- 
municable disease evolve is the one the theory postulates. So long as 
the theory does not explain all epidemic phenomena and does not fit 
within reasonable limits all epidemic causes we have not approached 
such proof. 

To attain such results allowances have to be made in order to intro- 
duce factors which the original theory did not consider and to account 
for their variability. This entails an increase of the number of para- 
meters, which has to be considered carefully. If such parameters are not 
related to the epidemic phenomena and can be measured independently 
from it they can be accepted without too much suspicion. But if they 
would be directly determined from the epidemic itself extreme caution 
becomes necessary because, given enough such variables, any theory 
will fit any phenomena. 

In our theoretical considerations we dealt, mostly, with factors of 
the first category. Such would be the proportion of recognizable cases, 
the length of the infectious period, the length and distribution of incu- 
bation period and duration of acquired immunity. These factors pertain 
to the disease and not to each specific outbreak. Once a reliable value is 
determined for a particular disease, it should be used in the study of 
all epidemics of that disease, unless very strong evidence against it is 
present. The same is true of the division of a population into strata. 
When possible to measure, the strata should depend entirely on the social 
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characteristics of the community and the same rules of division should 
be applied to all populations. 

The factors belonging to the second category would appear in the 
variable p or g. This, even in the simplest form of the theory, is deter- 
mined merely on the evidence furnished by epidemiological data. We 
give it the value that will make the observed facts fit the theory most 
exactly. In considering the effect of stratified populations we had to 
multiply the number of such variables. However, if we are able to count, 
without bias, the number of strata and the cases and susceptibles in 
each one of them, and if the same divisions of the population will apply 
to several epidemics of different diseases, the increase of the dependent 
variables will not be a serious disadvantage. 

We do not think that many more theoretical developments than those 
considered here will prove useful, before confirmation of the theory’s 
validity has been obtained. Such confirmation would have, as stated 
above, to be looked for in the application of the theory to actual epi- 
demics. We think that the developments offered in this paper cover a 
wide range of situations to allow for the collection of extensive evidence, 
provided that the independent variables can be counted. Two possible 
sources of data may be considered: natural and experimental epidemics. 
We can not, generally, obtain information on natural epidemics as 
detailed as necessary for this kind of study. Specially organized studies 
would be needed whenever circumstances are complex. Experimental 
epidemiology, hitherto a little explored field of research, might prove 
the ideal way of testing this or any other epidemic theory. In setting 
up an experiment we can measure most variables, like length of incu- 
bation period, of infectiousness and of immunity, numbers of cases and 
of susceptibles, stratification, ete. Similar measurements in human 
populations are always difficult and of dubious reliability. The possi- 
bility of repetition of the experiment under circumstances as identical 
as possible should allow us to study the effects of chance variation. 

The study of chance variation is very important. If the formulae 
of our theory fit observed epidemics it means that they are adequate 
mathematical expressions describing the mean expectancy, but not neces- 
sarily that the reasoning leading to the formulation is true. It is this 
reasoning that determines the expected chance variation and if the latter 
should be found to fit the observed variation in actual epidemics a further 
step would be achieved in proving the truth of the theory’s hypothesis 
of the mechanisms of propagation of contagious disease. However, an 
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explanation of the mechanics of explosive outbreaks of communicable 
disease is not all we may expect from an epidemic theory. We know 
that in large populations most diseases are continuously present either 
at a fairly constant level or following a cycle which may be seasonal, or 
cover a range of several years. This is true of the diseases passed 
directly from host to host, the ones our theory deals with, which generally 
have a seasonal cycle, often super-imposed on a longer one. In this case 
we have a biological balance between two populations: the host popu- 
lation and the parasite population. It would be the role of epidemic 
theories to interpret and explain the mechanisms of such balance, how 
it is maintained and why it is disrupted, whether increases and decreases 
of incidence mean a disturbance of the equilibrium caused by some 
extrinsic factor or simply indicate that the balance is not to be reached 
at a static level but at a fluctuant one. The answer to these questions 
in precise and exact terms would greatly extend our understanding of 
the problems of communicable disease and achieve the principal aim of 
epidemiological research in this field. 


SUMMARY 


The main purpose of this paper is to extend the Reed-Frost theory 
of epidemics so as to make it applicable to more complex situations, 
thus fashioning tools with which to study actual epidemics. 

The extensions were developed on a merely theoretical basis and the 
effects studied through stepwise calculations of model epidemics and 
the analysis of these. 

The variations studied are: 


1. Effect of inapparent infections and undercounting of cases. Two 
possibilities are considered : 

a. Most cases are recognizable (or most cases are counted): it is 
shown how the probability of adequate contact varies when computed 
from the apparent data. 

b. Most cases are not recognizable (or most cases are not counted) : 
it is shown that the apparent data seem to fit conditions different from 
the actual ones. 

2. Length of infectiousness and incubation period: it is shown in 
detail how the theory accounts for the development of epidemics in 
successive generations and the distribution of cases in each generation. 
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3. Variation in the duration of immunity: it is shown how varying 
the combination of three of the principal factors involved—size of the 
population of susceptibles, probability of adequate contact, length of 
immunity—will account, according to the theory, for the three types of 
behavior of communicable diseases: isolated outbreaks, endemicity with 
recurrent epidemic waves, and a stable endemic level. 


4. Stratification of the population as to probability of contact: it is 
shown how the probability of contact computed from the over-all results 
varies inversely with the number of cases present. 


5. A discussion is presented of the value of the new dependent and 
independent parameters introduced in the formulae and of the possible 
use of the theory and its extensions in dealing with real and experimental 
situations. Suggestions are made as to the possible course of future 
research in this field. 
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AN EXAMINATION OF THE REED-FROST 
THEORY OF EPIDEMICS * 


BY HELEN ABBEY 
The Johns Hopkins University 


INTRODUCTION 


HE STUDY of the flow of a disease through a population can be 
approached by making a set of assumptions about the relations 
among the factors producing the spread of the disease and expressing 
these assumptions in terms of a mathematical model. The model may 
then be tested on actual observations of diseases for which the assump- 
tions are thought to be valid. 

This is a useful approach since if the model adequately fits the data 
it lends support to the underlying assumptions of the model. If certain 
combinations of assumptions give a better fit than others, the simplest 
assumptions which give a good fit provide working leads for further 
study of the actual relationships which have been approximated by the 
model. The estimates of the parameters of the model may be useful 
in comparing different diseases or the same disease under different 
environmental conditions. 

The present investigation is an application of a specific model (the 
Reed-Frost model) to observations of certain acute infectious diseases 
where the assumptions are most likely to be valid. Although there is 
considerable discussion in the literature of epidemic models, very little 
testing of the models on actual observations has been done, partly 
because the necessary data are difficult or impossible to obtain. 

Lowell J. Reed and Wade Hampton Frost, in unpublished work used 
in class lectures at Johns Hopkins University, developed a model which 
was a modification of one orginally proposed by Soper (1927). Soper 


* From the Department of Biostatistics, The Johns Hopkins University School 
of Hygiene and Public Health, Baltimore, Md. Department Paper no. 285. This 
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had postulated a community in which all individuals had equal suscep- 
tibility to a disease, equal capacity to transmit it, and the power of 
passing out of observation when the transmitting period was over. He 
considered diseases in which the period of infectiousness is short relative 
to the incubation period. The “law of mass action,” which states that 
the rate of a chemical reaction is proportional to the product of the active 
masses of the substances, was assumed to apply to the transmission of 
the disease. Under these conditions, if the time interval is chosen to be 
the average incubation period of the disease, the number of cases at 
any time period is proportional to the product of cases and susceptibles 
in the previous period. 

Reed and Frost modified Soper’s model to make allowances for the 
fact that contact between a given susceptible and two or more cases will 
produce only one new case, an effect which Soper’s theory does not 
consider. Their model is based on the following assumptions: 


The infection is spread directly from infected individuals to 
others by a certain kind of contact (adequate contact) and in no 
other way. 

Any non-immune individual in the group, after such contact 
with an infectious person in a given period, will develop the 
infection and will be infectious to others only within the following 
time period, after which he is wholly immune. 

Each individual has a fixed probability of coming into adequate 
contact with any other specified individual in the group within 
one time interval, and this probability is the same for every 
member of the group. 

The individuals are wholly segregated from others outside the 
group. 

These conditions remain constant during the epidemic. 


If p is the probability of contact between any two specified indi- 
viduals in the population in a given interval of time (the period of 
infectiousness), then g = 1— p is the probability of their not having 
contact. Contact, or adequate contact, as used by Reed and Frost, is 
contact such that, if it occurs between an infectious case and a susceptible, 
it will produce a new case. The probability of contact, in this sense, 
depends on the susceptibility or resistance of the host, the infectivity 
of the parasite, the length of exposure and size of dose necessary to 
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produce the disease, as well as the environmental conditions necessary 
for the transfer of the organism. 

If C; is the number of cases produced at time ¢, then g@ is the 
probability that the specified individual will not have contact with any 
of the C; cases, and 1— q“ is the probability that he will have contact 
with at least one of them. Reed and Frost assumed as Soper did that 
the infective period is short relative to the incubation period, and that 


TABLE 1 
Calculation of a theoretical epidemic from the Reed-Frost model 
(p = .05) 
TIME NUMBER NUMBER OF 
PERIOD OF CASES  SUSCEPTIBLES CALCULATION OF C,,; AND 
t S, 
0 1 100 C, = 100(1 — .95) = 5.00 = 5 
S, = 100 — 5 = 95 
1 5 95 C, = 95(1 — .955) = 21.49 = 21 
S; = 95 — 21 = 74 
2 21 74 C; = 74(1 — .957") = 48.80 = 49 
S; = 74 — 49 = 25 
3 49 25 Cy = 25(1 — .95**) = 22.97 = 23 
S, = 25 — 23 = 2 
4 23 2 Cs = 2(1 — .95**) = 1.39 = 1 
S,=2-1=1 
5 1 1 C. = 1(1 — .95') = .05 = 0 
Ss = 1 0 = 1 
6 0 1 


the time interval is the average length of the incubation period. It 
follows that if there are S; susceptible individuals in the population at 
time ¢, the expected number of cases produced at the time ¢-+ 1 is S; 
times the probability of contact with at least one case. Or, 


Cr = gq). (1) 


This equation provides a method of stepwise calculation of cases at 
successive time periods. For example, suppose that the contact rate in 
a population of 100 susceptibles is .05. If one case is introduced into 
this population, the epidemic produced by the assumptions of the model 
will proceed as is shown in Table 1. The first case will produce 5, 
these five will produce 21, etc., until the epidemic ends with one suscep- 
tible remaining in the sixth time period and no new cases being produced. 
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Soper’s equation may be regarded as a first approximation to the 
Reed-Frost theory. For small numbers of cases relative to the numbers 
of susceptibles, the two methods are essentially equivalent. 

In the models of Soper and of Reed and Frost as presented above, 
the whole course of the epidemic is determined if the initial conditions 
are known, and may be calculated step by step as previously indicated. 
Reed and Frost, however, considered that an epidemic is not uniquely 
determined by the initial conditions because at each period there are 
variations due to chance. They used a mechanical device to illustrate 
the effect of chance variation on the theoretical epidemics. 

The concept of chance variation can be introduced into the mathe- 
matical model by replacing the equation 


by the statement that the probability of C;,, cases in the (t+ 1)-th 
interval is 


P (Cras) = (1 — (2) 


! 


where S; and (; are the observed numbers of susceptibles and cases, 


respectively, in the ¢-th interval. Note that this is an ordinary binomial | 


probability, where 1— gq“ is the probability of a susceptible becoming 
a case in the (¢ + 1)-th interval. 

Epidemics can be calculated stepwise from this model, with the aid 
of random numbers and a table of the cumulative binomial distribution 
(National Bureau of Standards, 1949). For example, consider an 
epidemic in a population having a contact rate p, with S, susceptibles 
and (, cases in the initial time period. From equation (1), the number 
of cases expected in period one is 


E(C,) S8.(1—q@). 


The expected number of cases is written H(C,) to differentiate it from 
O(C,), the number actually produced. 

The observed number of cases O(C,) is obtained by drawing it at 
random from the binomial distribution having a mean E(C,) and a 
probability 1— q of becoming a case in period one. This is done by 
going to the binomial table of partial sums 
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For a specified value of P and of n, there is a partial sum corresponding 
to each value of r between r= 1 and r=—n—1. You choose a partial 
sum at random, thereby choosing an r at random. In the first time 
period, 

r= 8, 


P=—1—q®. 


Now, draw a random number (of seven digits since the sums are seven- 
digit numbers). Locate the sum closest to it, for the n and P given 
above, and read off the corresponding r. This is the observed number 
of cases O(C,) in the first time period. The observed number of 
susceptibles is 


=S,—0(C,). 


Proceeding to the next time period, an observed O(C,) is drawn at 
random from the binomial distribution having a mean E(C,) and a 
probability 1— gq) of becoming a case; that is, from the sum 


(1) 
n—0(S,) 
P = 1— qd, 


where 


The r corresponding to this n and P and to a partial sum chosen at 
random, is the observed number of cases O(C;). The observed number 
of susceptibles is obtained by subtraction: 


O(S2) = O(S,) —O(C2). 


The calculations continue in this manner until no new cases are produced. 
For values beyond the range of the binomial tables, the Poisson or 
normal distributions may be used as approximations to the binomial. 
In the sections which follow, a chi-square test is used to determine 
whether the theory adequately fits the observed epidemics in human 
populations. This is a test of whether the variation found to occur in 
actual epidemics is greater than the binomial variation described above. 
The test is described in a later section on the application of the model. 
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SOURCES OF DATA 


Although there is a great deal of published data on the reported 
cases of infectious diseases, most of this is not useful in testing the 
adequacy of the model because of variable amounts of under-reporting 
of cases, and of lack of information about the number of susceptibles. 

The requirement of a closed population with uniform mixing among 
its members is most nearly met in institutions or within families. In 
these groups also, the size of the population is known, and estimates of 
the number of susceptibles can usually be obtained. Likewise, the dis- 
eases on which the model should be tested first are those which come 
closest to fulfilling the assumptions of the model. Such diseases have 
the following characteristics : 


The period of infectiousness is short relative to the incubation 
period. 

There is uniform susceptibilty to the disease before the attack. 

A single (adequate) contact between an infectious case and a 
susceptible produces the disease. 

A single attack of the disease produces lasting immunity. 

There are few carriers, or subclinical or missed cases. 

The incubation period is constant. 


The diseases which approximately satisfy these conditions are the acute 
infectious diseases of childhood, with measles most nearly fulfilling these 
conditions, and German measles and chickenpox also being fairly satis- 
factory. The others, such as scarlet fever, poliomyelitis, diphtheria, 
mumps and whooping cough not only do not fit these criteria as well, 
but aiso are of limited usefulness because they usually produce too few 
cases in any epidemic period to provide much information on which 
to test the theory. 

The majority of the epidemics used in this study were obtained from 
the Medical Research Council Special Reports, Numbers 227 and 271, 
“ Epidemics in Schools.” These are reports of a study of the incidence 
of epidemic diseases in Naval and boarding schools in England during 
the years 1932 to 1939. Table 2 gives some the characteristics of these 
populations. For a more complete description of the populations, 
reference may be made to the original reports. The code symbols used 
to identify the epidemics in the present paper are the same as those 
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used in the original reports. The reported number of susceptibles is 
not stated in these two studies, but has been obtained by dividing the 
total number of cases by the attack rate on susceptibles. The method 
by which the prior history of disease was obtained could not be ascer- 
tained from the reports. It is likely that there is considerable variation 
in the method among the schools, depending on the routines which were 
in operation at the beginning of the study. 

In addition, single epidemics have been obtained from each of the 
following sources: 

The measles epidemic labelled “Aycock ” is taken from “ Immunity 
to Poliomyelitis” by W. Lloyd Aycock, in the American Journal of 
Medical Sciences, September, 1942. It occurred in a New England 
boys’ boarding school in 1934. 

The Mexico school chickenpox data are from a study done by Bahlke, 
Silverman, and Ingraham on ultra-violet light irradiation which is 
reported in the American Journal of Public Health, October, 1949. 
This was an unirradiated school used as a control. The epidemic 
occurred in the winter of 1945-1946. 

The two epidemics labelled Institution #12, are from “ Measles in 
Institutions for Children, Part 2, Use of Convalescent Serum” by 
Edward S. Godfrey, Jr., in the Journal of Preventive Medicine, January, 
1928. In 1923, an epidemic occurred in an institution for normal 
children in New York State. Ward C, the largest ward which was 
attacked by the disease, has been separated from the total figures for 
the institution for comparative purposes. Although convalescent serum 
was used, the author felt that it had been ineffective in altering the 
progress of the epidemic or the severity of the disease. No report of 
the number of susceptibles is given in the article. 

The tenement epidemic is reported in the Medical Research Council 
Special Report No. 120, “An Inquiry into the Relationship between 
Housing Conditions and the Incidence and Fatality of Measles.” The 
epidemic occurred in the winter of 1925-1926, in seven tenement 
buildings having a total population of 538, of whom 173 were children 
under 10 years of age. The buildings were situated on a short cul-de-sac 
which was used as a playground by the children. 

It should be noted from Table 2 that the populations from which the 
epidemics came differ both in age distribution and in the environmental 
factors which influence the contact rate. These differences must be 
considered in making comparisons of the results obtained from fitting 
the model to the epidemics. 


| 
) 
| 


ws 


EXAMINATION OF REED-FROST THEORY 209 


The final set of data (not shown in Table 2) is from “ Measles and 
Scarlet Fever in Providence, R. I... .” by Wilson, Bennett, Allen, and 
Worcester. The figures used are measles cases in Providence, R. I., in 
1923, in families of four or more children having one primary case and 
three susceptibles. They have classified the families into groups which 
are not mutually exclusive so that some families occur in all the groups 
and others in only a few or only one. The definitions of the three 
groups chosen for study here, are given in a later section. 


METHOD OF APPLYING THE MODEL 


Each epidemic was divided into intervals of the length of the incu- 
bation period of the disease in such a way that, in so far as possible, 
there was a clustering of cases in the middle of each interval. The 
Reed-Frost model was then applied to the total counts of cases and 
susceptibles in each interval. 
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Fic. 1. NuMBER OF CASES OF MEASLES (AYCOCK EPIDEMIC) BY 
Day OF OCCURRENCE 


For example, Figure 1 shows each case in the Aycock epidemic by 
day of occurrence. If the incubation period is taken as twelve days and 
the first case occurs on the fifth day of the interval, there is the following 
distribution of cases in successive twelve-day periods: 


REPORTED NUMBER REPORTED NUMBER 
TIME OF CASES OF SUSCEPTIBLES 
0 1 117 
1 9 108 
2 22 86 
3 61 25 
4 13 12 
5 0 12 
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An estimate of the contact rate can be obtained from each time 
period by substituting these cases and susceptibles in the Reed-Frost 
model and solving the resulting equations for g. Thus from the first 
interval, if the number of susceptibles at time 0 is 117, we have 

9 = 117(1—q,') 
or qi = 0.9231 
and Pp: = 0.0769. 
In the second time period 


22 — 108(1 — q,") 


q2 = 0.9750 
p2 = 0.0250. 
Similarly, ps = 0.0546 and p, = 0.0120. 


The method which is used for estimating a single contact rate for 
the entire epidemic is the “ method of maximum likelihood.” This is a 
standard estimation procedure, and is equivalent to a weighted average 
of estimates calculated at each point. The maximum likelihood method 
consists of writing down the “likelihood” of obtaining the observed 
epidemic: 


n-1 
Likelihood = L — JT] (1 — Stas, (3) 
t=0 “tel 


where S and C are the observed numbers of susceptibles and cases, and 
q is the parameter to be estimated; and in finding the value of q which 
makes ZL a maximum. This value of q is calculated by setting the 
derivative of log Z with respect to q equal to zero, and solving the 
resulting equation for q: 

2 q(1—q") q 
There is no explicit solution, but the equation may be solved by successive 
approximations. 

To test whether the observations fit the mathematical model, the 
estimated contact rate is substituted in the Reed-Frost equation to deter- 
mine expected numbers of cases and susceptibles at each time period: 


= 0. 


gq. = 1— 0.7963 | 
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(Cr) O(8t) (1— 9), 
E (St) = O(St) 


These expected numbers are compared with the observed numbers by a 
chi-square test of goodness of fit: 


[O(C:) —E(C;)}? [O(S) — E(S:) 
E(C;) + E(8:) (5) 
The chi-square test is applicable here because the expected numbers in 
each time period are calculated from the observed numbers in the pre- 
vious period, and are therefore independent of the previous expected 
numbers. The test measures whether the variation at each point is of 
the magnitude of the binomial variation discussed on page 204. 


RESULTS OF THE APPLICATION OF THE MODEL TO SCHOOL EPIDEMICS 


Tables 3 and 4 and Figure 2 show a comparison of observed and 
expected numbers of cases. In general the discrepancies are as follows: 
(1) There are too few expected cases in the early time periods and too 
many in the later ones. (2) The curve of the expected cases reaches 
its peak one period later than the curve of the observations. (3) The 
discrepancies between the observed and expected numbers, as measured 
by the chi-square test, are very large. 

These differences may be due to errors of observation, either because 
of actual errors in counting susceptibles or cases, or because of a poor 
choice of the intervals into which the observations are grouped. They 
may also be due to a failure of the assumptions of the model to represent 
the actual processes involved in the spread of the disease. These possible 
causes are examined in some detail in the following sections. 


ERRORS IN COUNTING SUSCEPTIBLES 


Counting errors are more likely to occur in the susceptibles than in 
the cases, since the cases were counted as they occurred, while the 
susceptibles, being defined as persons not having a history of the disease, 
were counted from events remembered. 

The number of susceptibles was therefore assumed to be unknown, 
and estimates were made both of the initial number of susceptibles and 
the contact rate, using only the observed number of cases in each time 
period. These estimates (shown in Tables 3 and 4) were obtained by a 
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maximum likelihood method similar to that used in estimating the con- 
tact rate from observed cases and known numbers of susceptibles. 
From these estimates, expected cases were calculated for each time 


TABLE 3 
Comparison of number of susceptibles reported, with number estimated from the 
Reed-Frost theory 
NUMBER OF SUSCEP- 
INITIAL NUMBER RATIO: 


TIBLES REMAINING 


ESTIMATED aT END OF EPIDEMIC 
TO REPORTED 


REPORTED oF SUSCEPTIBLES 
EPIDEMIC NUMBER 


OF CASES Re- Esti- suscepriBLes Re- Esti- 
ported mated ported mated 
MEASLES 
KB 1933 lll 204 112 0.55 93 1 
FB 1934 78 109 78 0.72 31 0 
QB 1934 103 135 103 0.76 32 0 
CB 1936 110 170 110 0.71 60 0 
TB 1938 88 123 88 0.72 35 0 
GB 1938 62 80 62 0.78 18 0 
Aycock 106 118 106 0.90 12 0 
Tenements 35 88 41 0.47 53 6 
Inst. Total 184 258* 184 >0.71 <74 0 
Inst. Ward C 79 100* 79 >0.79 <21 0 
GERMAN MEASLES 
BB 1934 258 533 260 0.49 275 2 
EB 1934 141 247 141 0.57 106 0 
QB 1934 271 427 271 0.63 156 0 
TB 1939 72 480 74 0.15 408 2 
FG 1939 52 216 52 0.24 164 0 
CHICKENPOX 
MB 1932 30 166 30 0.18 136 0 
CB 1934 40 236 40 0.17 196 0 
EB 1937 34 100 34 0.34 66 0 
RB 1939 90 194 90 0.46 104 0 
Mexico School 75 188 94 0.50 113 19 


* Total population. Susceptibility status not reported. 


period. They are shown in Figure 3 in comparison with the observed 
cases. There is no longer any bias evident in the graphs nor in the runs 
of signs of observed minus expected cases. The chi-square values have 
been greatly reduced, and although they are still significantly large, 
this is of less concern since the bias has been removed. As the contact 
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Fie. 2a. COMPARISON OF OBSERVED CASES OF MEASLES WITH CASES CALCULATED 
FROM REPORTED SUSCEPTIBLES AND ESTIMATED CONTACT LATE } 


(Solid lines indicate observed cases. ) 
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BB 1934 
GERMAN MEASLES 


Fig. 2b. CoMPARISON OF OBSERVED CASES OF GERMAN MEASLES AND CHICKENPOX 
WITH Cases CALCULATED FROM REPORTED CASES AND ESTIMATED CON- 
tact Rate’ (Solid lines indicate observed cases.) 
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Fic. 3a. COMPARISON OF OBSERVED CASES OF MEASLES WITH CASES CALCULATED i 
FROM EsTIMATED INITIAL SUSCEPTIBLES AND ESTIMATED CONTACT RATE 


(Solid lines indicate observed cases. ) 
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Fie. 3b. COMPARISON OF OBSERVED CASES OF GERMAN MEASLES AND CHICKENPOX 
WITH CASES CALCULATED FROM ESTIMATED INITIAL SUSCEPTIBLES AND 
EstIMaTeD Contact (Solid lines indicate observed cases.) 
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rate has been defined, it includes all the factors acting to produce the 
spread of the disease such as the susceptibility or resistance of the host, 
the length of exposure and size of dose necessary to produce the disease, 
and the number and kinds of social contacts within the population. The 
net effect of all of these factors is assumed to be the same for every 
individual in the population. However, these are human populations 
made up of individuals who are not precisely alike, but vary in their 
biological and social characteristics. This variation will contribute to 
the total variation measured in the epidemic, tending to make it larger 
than that expected from random effects alone. 

A comparison of columns 2 and 4 of Table 3 shows that the two 
epidemics which were not in institutions, i.e., Tenements and Mexico 
School, had 14.6 per cent and 20.2 per cent, respectively, of the estimated 
susceptibles remaining at the end of the epidemics. Of the institution 
epidemics, 14 had no estimated susceptibles remaining, one had one, and 
two had two. This implies that in the institution type of population, 
there is essentially complete exhaustion of the susceptible population. 
In these populations, the estimated susceptibles average 70 per cent of 
the reported susceptibles for the measles epidemics, 42 per ~ nt for 
German measles, and 28 per cent for chickenpox. Since th_re is no 
method for testing the individuals in a population to see if they are 
actually susceptible to these diseases, we can only look at the esti- 
mates to see if they are reasonable in comparison with other available 
information. 

If the school epidemics do, in fact, exhaust the susceptible population, 
then the reported susceptibles who did not acquire the disease must be 
assumed not to be susceptible in spite of having no previous history of 
the disease. Table 5 shows that in the measles epidemics about 7 per 
cent of the population had no history of the disease at the end of the 
epidemics. In comparison, a survery of large cities in the United States 
by Collins (1942) showed that by age 19, 93 per cent of the population 
report prior history of measles. Since very few persons, presumably 
less than one per cent, acquire the disease beyond age 19, most of the 
remaining 7 per cent may be regarded as not susceptible in spite of 
having no history of the disease. Whether they are not susceptible 
because of natural immunity or immunity acquired through a forgotten 
or unrecognized case or by repeated contact with the disease does not 
matter for the purpose of the present paper. 

Similar comparisons for chickenpox and German measles do not show 
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as good agreement. Collins gives 32 per cent of persons 19 years of age 
having no history of chickenpox, against 21 per cent of the school popu- 
lations having no history of the disease at the end of the epidemic; and 
he gives 69 per cent having no history of German measles against 46 
per cent for the school populations. If the overcounting of susceptibles 
is due to the failure to recognize cases of a disease or to remember its 


TABLE 5 
Per cent of school populations having no history of the disease at the end of the epidemic 


% OF POPULATION HAVING 
EPIDEMIC POPULATION NO HISTORY OF THE DISEASE 
AT END OF EPIDEMIC 


MEASLES 
KB 1933 745 11.3 
FB 1934 515 6.6 
QB 1934 503 5.4 
CB 1936 830 6.6 
TB 1936 494 6.5 
GB 1938 245 6.9 
Aycock 400 3.0 
Total 3,732 7.0 
GerMan MEASLES 
BB 1934 664 42.0 
EB 1934 264 31.8 
QB 1934 503 27.4 
TB 1939 500 68.0 
FG 1939 223 63.7 
Total 2,154 45.6 
CHICKENPOX 
MB 1932 583 23.5. 
CB 1934 814 24.1 
EB 1937 273 22.7 
RB 1939 657 16.8 
Total 2,327 21.4 


occurrence, this failure is likely to be of more importance in Collins’ 
data, which is purely historical, than in the school epidemics where, 
during the period of the study, the cases were recorded as they occurred. 
This effect would lead to a larger number of reported susceptibles who 
were actually immune, in the purely historical data. This is in accord- 
ance with the differences found between Collins’ data and the school 
epidemics. 
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CHOICE OF INTERVAL 


In order to test the theory, the observations should be grouped into 
intervals so that the cases in any interval are those which are produced 
by contact with cases of the previous interval. Since the actual genera- 
tions of cases were not known in the school epidemics, they were esti- 
mated by dividing the observations into intervals of the length of the 
average incubation period of the disease, centering successive clusters of 
cases in successive intervals. This method may introduce errors both 
because the length of the incubation period is not precisely known and 
because the choice of the center of the interval is a matter of judgment. 

For example, in discussing the method of applying the model to the 
Aycock epidemic of measles, a twelve-day incubation period was used, 
with the first case occurring on the fifth day of the interval. The model 
was also fitted to the cases per generation obtained by assuming the 
first case occurred on the Ist, 2nd, ete., up to the 12th day of the first 
interval, to determine which of these twelve divisions of the cases into 
generations fitted the theory best. All possible thirteen and fourteen 
day intervals were also tried for the Aycock epidemic, and a ~ uge of 
incubation periods and starting points for five of the other _ .demics. 

In general, the length or position of the interval does not affect the 
estimate of the initial number of susceptibles. There is variation in the 
contact rate by interval, but it is less than the variation between epi- 
demics. There is also wide variation in the goodness of fit depending 
on the interval and endpoints chosen. A study of 12-, 13- and 14-day 
intervals for measles shows that the range of chi-square values for 
different endpoints for a given length of incubation period is too large 
to conclude that one length is substantially better than another. For 
example, in the Aycock epidemic, the 12-day intervals have chi-square 
values ranging from 6 to 27, the 13-day intervals from 3 to 48, and the 
14-day intervals from 0.7 to 30. There is therefore no evidence from 
this comparison that any appreciable part of the discrepancy between 
theory and observations is due to having used a 12- rather than a 13- or 
14-day incubation period. 

Although the chi-square values vary widely depending on the position 
of the endpoints of the intervals, the smallest values are not particularly 
associated with the intervals which would be chosen by inspection of the 
data as representing generations of the disease. That is, the intervals 
centering successive clusters of cases in the center of successive intervals 
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are not consistently better than those centering the clusters at the end 
of the intervals. Of the intervals which are consistent with the assumed 
generation process of the disease, the chi-square values are generally 
large, and if the investigation is limited to these intervals, the divergence 
of the theory from the observations is not due to the particular choice 
among them. 


VARIATION OF THE CONTACT RATE WITH TIME 


If the reported number of susceptibles is used to estimate contact 
rates at each time period, these rates in general decrease with time (See 
Fig. 4). Because contact rate as used in this model is affected by the 
susceptibility of the host, the infectivity of the organism, and the social 
conditions in the community, a decline in rate may be accounted for by 
various hypotheses including progressive immunity acquired by repeated 
contact with the disease, decreasing virulence of the organism, changing 
environmental factors, or changing relations among hosts. Some of 
these factors would cause the rate to decline continuously, while others, 
as for example, isolation procedures, would cause a sharp decline in 
the rate at the point where these procedures were introduced. 

To investigate the hypothesis of a continuous decline, an inspection 
of the graphs (Fig. 4) suggests that an exponential function may serve 
as a simple descriptive curve of the rates. Exponential curves were 
fitted to the contact rates for each epidemic and expected cases calculated 
from the variable rates so determined. For the group of epidemics as 
a whole, this procedure was not as effective in improving the fit as 
estimation of the initial number of susceptibles, although the possibility 
that there is a decline in the contact rate with time cannot be ruled out. 

If the decline is largely due to isolation procedures introduced after 
the first case is discovered, as the sharp drop in contact rate between the 
first and second period suggests, a better fit of the model should be 
obtained by estimating the constant contact rate exclusive of the first 
interval. When these estimates (and the reported number of suscep- 
tibles) are used to calculate expected numbers of cases, the fit of the 
model, as measured by chi-square, is improved in the three epidemics 
where the decline in rate between the first and second periods is greatest 
(Compare Table 4 with Table 6). In all of the other epidemics, how- 
ever, omitting the first interval is not nearly as effective as estimating 
the initial number of susceptibles. It therefore appears that the change 
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in contact rate in this interval is not in general sufficient to account for 
the failure of the model to fit the data. 


VARIATION IN THE CONTACT AMONG INDIVIDUALS 


If the population in which an epidemic occurs is actually a group 
of smaller populations with some cross-contact among them, this will 


TABLE 6 
Goodness of fit of the Reed-Frost model to reported cases and reported susceptibles, 
omitting the first interval 

EPIDEMIC x? d.f. P(x?) sIGNs* 
MEASLES 

KB 1933 62.5 4 < .00001 ++--- 

FB 1934 112.3 2 < .00001 ieee 

QB 1934 58.9 6 < .00001 —+4++--—- 

CB 1936 14.0 2 .0009 +—= 

GB 1938 111.7 2 < .00001 +-—- 

Aycock 43.0 3 < .00001 

Tenements 13.6 8 -09 +-+-+-0+- 
GERMAN MEASLES 

BB 1934 148.4 4 < 00001 

EB 1934 226.8 3 < .00001 ++-- 

TB 1939 40.0 5 < .00001 0+---- 
CHICKENPOX 

MB 1932 3.1 3 .38 ++-— 

CB 1934 40.9 3 < .00001 ++-- 

EB 1937 13.9 3 .009 +++— 

RB 1939 49.9 4 < .00001 ++--- 

Mexico School 31.0 10 .0016 +—+4+4+-4+-4+4+- 


* Sign of observed cases minus expected cases in each generation. 


introduce a source of variation not considered in the model. It would 
be desirable, therefore, to divide the school populations into their prob- 
able sub-populations, as for example, school grade or dormitory. How- 
ever, the information needed to permit such a breakdown is not available. 

Some information as to the effects which this variation might produce 
may be obtained by calculating a theoretical epidemic in a population 
which is assumed to be made up of sub-populations with some cross- 
contact among them. Thus, consider a population made up of two sub- 
populations A and B, with some mixing between them. Let p be the 
probability that an individual has adequate contact with a member of 
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his own group, and P be the probability that he has adequate contact 
with a member of the other group. In time ¢ + 1, the probability that 
an individual in population A escapes the C;4 cases in population A and 
the C;® cases in population B is 


‘ ge 


and the probability that he meets at least one case (from either popu- 
lation A or B) is 
1 qo“ ° Qe. 


Then the number of cases produced in population A at time ¢ + 1 is 


SA (1 — QO), (6) 
Similarly, the number of cases produced in population B at time ¢ + 1 is 


An epidemic starts with the introduction of a case into one of the 
populations. Stepwise calculations of the epidemics in the two popu- 
lations proceeds until there are no new cases in either A or B. The 
two epidemics are then added together to produce an apparent single 
epidemic. 

Epidemics were calculated in this manner for values of the contact 
rate between populations, P, ranging from zero when the two populations 
are completely separated, to p, the point where the two become one 
homogeneous population. 

Table 7 shows the results of testing the fit of the Reed-Frost model 
to the combined epidemics, a single contact rate and an initial number 
of susceptibles being estimated from the total cases in each time period 
in the two populations. The deviations of the theory from the combined 
epidemics are too small to detect the variation in contact rate which 
had been inserted. When random variation is added to the epidemics 
at each time period (Table 7), the model still fits the combined epidemics 
so well that they are indistinguishable from single ones. The order of 
magnitude of these deviations is much less than that which occurs in 
the school epidemics. 

Because this experimental evidence has been obtained for only a very 
limited set of conditions, i. e., two populations with selected contact rates 
within and between them, the results can be no more than suggestive. 
There is no evidence here that large deviations are due to variation in 
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the contact rate within a population. However, to obtain conclusive 
evidence a more realistic model of the conditions likely to be found in 
an actual community is needed, for example, one giving each individual 
several contact rates. 


APPLICATION TO FAMILY DATA 


The family data presented in this section are from a study by Wilson 
et al. (1939), of cases of measles in Providence, Rhode Island. He has 
tabulated the kinds of epidemics which develop in families having one 
primary case and three susceptibles, classifying the families according to 
several criteria, three of which have been selected for analysis here. 
These are, using Wilson’s code numbers: 


VIII—Four-child families with three susceptibles of all ages under 
22, including infants. 
IX—Four-child famiiles with three susceptibles under ten years 
and over 7 months. 


XII—Families with four or more children, with three susceptibles 
of all ages under 22, including infants. 


The family is assumed to be the closed population, and the one case 
and three susceptibles are the initial conditions. There are eight possible 
epidemics which may occur in these populations: 


(1,0,0,0) 1 primary case and no others. 
(1,1,0,0) 1 primary case and 1 secondary case. 
(1,1,1,0) 1 primary case, 1 secondary case, 1 tertiary case. 
(1,2,0,0) 1 primary case, 2 secondary cases. 
(1,1,1,1) 1 primary case, 1 secondary case, 1 tertiary case, 
1 quaternary case. 

(1,1,2,0) 1 primary case, 1 secondary case, 2 tertiary cases. 
(1,2,1,0) 1 primary case, 2 secondary cases, 1 tertiary case. 
(1, 3,0,0) 1 primary case, 3 secondary cases. 

Expected numbers of families having each of these kinds of epidemics 
have been calculated from reported cases and susceptibles, assuming a 
constant contact rate within the families. Table 8, columns 2, 3, and 4, 


shows a comparison of observed and expected numbers of cases. The 
discrepancies are large. When the cases are added together by time 
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TABLE 8 
Observed and expected numbers of families having each possible kind of measles 


epidemic 


Expected numbers calculated from: 


Estimated contact rate p, and estimated proportion A, of families with three 


Estimated contact rate p; 


susceptibles; 
Estimated contact rate p; in the first period, and p, in the later periods. 


EXPECTED 


NUMBER OF 
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CALCULATED 
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1 2. 2.3 Ni 
0 2. 12.6 i 
im 1 12. 5.8 a 
83 36. 38.0 
4 1. 1.7 
10 5. 1.1 
34 34. 25.7 
Total 151 151.0 
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1 4 0 
1 3 0 
| 1 1 3 
1 8 9 
1 4 1 7 
| 1 3 3 a 
1 10 28 
1 67 53 
| 100 99 i 
XI 
| 1,00 0 4 
4 8 
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period (Table 9, col. 2, 3, 4) it is seen that there is a bias in the sense 
that too many of the observed epidemics end in the first period. This 
is similar to the effect observed in the school epidemics. 

Wilson, using a somewhat different mathematical model, calculated 
expected numbers of each kind of epidemic and found discrepancies 


TABLE 9 
Observed and expected cases of measles in families by time of occurrence 


Expected numbers calculated from: 
Estimated contact rate p; 
Estimated contact rate p, and estimated proportion \, of families with three 
susceptibles; 
Estimated contact rate p, in the first period, and p, in the later periods. 


EXPECTED EXPECTED EXPECTED 
OBSERVED NUMBER NUMBER NUMBER 
Bc NUMBER OF CASES x? OF CASES x* OF CASES x? 
- OF CASES CALCULATED CALCULATED CALCULATED 
FROM p FROM p AND XA FROM Pp; AND P2 
Vill 
1 340 288 25.8 337 0.1 340 0.0 
2 26 106 115.6 58 35.3 35 3.4 
3 a ll 8.1 2 2.1 2 2.1 
Total 370 405 149. 5* 397 37. 377 5.5f 
Ix 
1 248 231 5.4 244 0.4 248 0.0 
2 21 58 148.1 39 27.4 30 6.4 
3 4 3 0.5 1 9.6 2 2.2 
Total 273 292 154.0* 284 37.4T 280 8.6 
XII 
1 519 431 42.5 499 2.4 519 0.0 
2 48 175 206 .6 113 68.7 106 59.3 
3 8 22 10.6 5 1.9 9 0.1 
Total 575 628 259.7* 617 73.0T 634 59.4f 
* x? has 2 degrees of freedom. 
t x? has 1 degree of freedom. 


which are essentially the same as those found when using the Reed-Frost 
model. He does not investigate the possible causes of the discrepancies, 
but concludes that “ with the discrepancy so great between the observa- 
tions and prediction . . . we may definitely assert that the theory is non- 
applicable to Providence data for measles.” We have investigated some 
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of the possible sources of variation as was done with the school epidemics 
to see if there are modifications which improve the fit of the theory to 
the data. 


ERRORS IN COUNTING SUSCEPTIBLES 


Although each epidemic of itself contains too little information to 
provide an estimate of the initial number of susceptibles, they may be 
combined by assuming that some proportion of the families actually had 
three susceptibles, the remainder having only two. This proportion may 
be expressed as an unknown parameter to be estimated jointly with the 
contact rate. To test whether this assumption improves the fit, the two 
parameters are used to calculate the expected numbers of families 
(Table 8) and expected numbers of cases (Table 9). The improvement 
in fit is substantial, but the discrepancies are still large. 

It is noteworthy that the best fit of the model to the observations is 
obtained when the classification of families limits the susceptibles to the 
ages where they can be most accurately counted, that is, to children 
between the ages of six month and ten years (IX); and that the fit is 
poorest when they can be least accurately counted (XII). 


VARIATION IN THE CONTACT RATE WITH TIME 


The contact rates, estimated separately for each interval, are as 
follows : 


CONTACT RATE 


TIME PERIOD VIII Ix XII 
1 822 
2 .208 .406 .201 
3 571 -800 471 


There is a sharp decline in the rates between the first and second 
periods, similar to that which was observed in a number of the school 
epidemics. Although the rates rise again in the last period, this rise 
may not have much significance because, (1) it is based on small 
numbers of observations (.800 is based on 5 families), and (2) Wilson, 
noting that the attack rates in this period increased, stated that in 
classifying cases as primary, secondary, etc., doubtful cases were classi- 
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fied with the longer chains rather than the shorter. The effect of this 
method of classification is to increase the contact rates in the last interval. 

Considering, as was done in the school epidemics, the effect of a drop 
in contact rate after the primary case has been discovered, a joint esti- 
mate was made of two contact rates, one in the first period, and the other 
in the second and third periods. From these estimates, expected numbers 
of families were calculated (Table 8) and expected numbers of cases 
(Table 9). Tests of goodness of fit show that the assumption of two 
contact rates improves the fit of the model over that obtained when a 
single, constant contact rate is used, and is somewhat more effective than 
is the assumption of a constant contact rate and an estimated number 
of susceptibles. 


VARIATION IN THE CONTACT RATE AMONG INDIVIDUALS 


If the kinds of epidemics are considered separately, it is seen that 
neither the assumption of overcounting of susceptibles nor a contact rate 
changing with time is adequate to account for the discrepancies which 
occur. Because the school data are single epidemics and the family 
data are composed of 151 (or 100 or 249) epidemics, the latter provide 
an additional source of variation not previously considered. This is the 
variation among families in the contact rate. To see whether this varia- 
tion contributes significantly to the chi-square values of Tables 8 and 9, 
the intervals have been considered separately. For example, consider 
only the first time period. It is seen from Table 8 that among the 100 
Type IX families, there were 4 who had no cases in the first period, 11 
who had one, 18 who had two, and 67 who had three. These data pro- 
vide an estimate of the assumed constant contact rate which can be used 
to calculate numbers of families expected from the assumptions of the 
model. Table 10 shows that the deviations of the observations from the 
expected numbers are large, and represent an appreciable portion of the 
total variation in the epidemic. Therefore a possible explanation for 
much of the discrepancy between the observations and the model may be 
the variation among families in their contact rates. 

The application of the model to epidemics within families has shown 
three sources of variation in the data which may be of importance in 
causing discrepancies between the theory and the observations. These 
are the overcounting of initial susceptibles, the decline in contact rate 
with time, and the variation in contact rate among families. The first 
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two of these agree with the findings of the application of the model to 
school epidemics. The third is an additional source of variation which 
could not be considered in the school epidemics. 


SUMMARY 


The Reed-Frost theory of the spread of epidemics has been tested on 
two series of observations derived from disease and population conditions 
TABLE 10 


Observed and expected numbers of families having given numbers of cases in the first and 
second time periods (contact rates estimated separately for each interval) 


Vill Ix XII 
NUMBER NUMBER OF NUMBER OF NUMBER OF 
OF FAMILIES FAMILIES FAMILIES 
ASES x x 
served pected served pected served pected 
First time period 
0 10 2.3 25.1 4 0.6 21.1 20 7.1 23.6 
1 17 21.2 0.8 11 7.8 1.3 46 48.3 0.1 
2 49 63.7 3.4 18. 36.0 9.0 76 110.1 10.6 
3 75 63.8 1.9 67 55.6 2.3 107 83.5 6.6 
Total 151 151.0 31.2{ 100 100.0 249 249.0 40.9f 
Second time period* 
Ot 43 41.4 0.1 11 10.2 0.1 81 77.8 0.1 
1 20 23.9 0.6 15 17.0 0.2 34 42.3 1.6 
2 3 0.7 0.7 3 1.8 0.8 7 1.9 14.2 
Total 66 66.0 1.4§ 29 29.0 1.1§ 122 122.0 15.9§ 


* Expected numbers based on observed numbers in first interval. 
t Does not include families having no cases in first interval. 

t x? has 2 degrees of freedom. 

§ x? has 1 degree of freedom. 


which approximate the assumptions of the theory. The first series is 
composed of epidemics of measles, chickenpox and German measles in 
boarding school populations, chiefly, were each school represents a closed 
universe. The second series consists of epidemics of measles in families 
each of which is a universe having four (reported) susceptible members. 

The model was fitted to the reported cases by generation of the disease 
and reported number of susceptibles at the beginning of the epidemic. 
The theory fails to fit either series of observations, the discrepancy 
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between the theory and the data being consistently in the direction of 
more observed than theoretical cases in the early generations, and fewer 
observed than theoretical ones in the later periods. 

Removal of this bias, and a substantial improvement in the goodness 
of fit as measured by chi-square tests, is obtained when the number of 
susceptibles is assumed to be unknown and estimated from the observed 
numbers of cases only. The estimated number of susceptibles is in most 
instances the total number of cases observed, implying that the popu- 
lation is exhausted of susceptibles. Collateral evidence is available to 
support the hypothesis that the difference between the reported and esti- 
mated numbers of susceptibles may be due to the inclusion in the reported 
numbers, persons who are not in fact susceptible to the disease. 

A second hypothesis which also improves the fit of the theory to the 
observations is that the contact rate declines with time. This is somewhat 
less effective than the assumption of overcounting of susceptibles, but on 
the basis of the present analysis, it cannot be entirely ruled out as a 
source of the discrepancies between the model and the data. 

Other sources of variation which were investigated are: variation in 
fit depending on whether 12, 13, or 14-day incubation periods are used, 
variation in fit depending on whether or not the epidemic is divided 
into periods appearing to represent the true generations of the disease, 
and the amount of error introduced when the population is not homo- 
geneous but composed of sub-populations with some contact among them. 
The evidence in this paper does not suggest that any of these factors 
are likely to be important sources of the discrepancies between the theory 
and the observations. 
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